Re long list split - html

I have been struggling with this problem kinda while. Basicaly i need to split a loong HTML list(obviously i will not put it in here cos it is fricken massive). I tried the str.split() method but you can only put one parameter at the time. Soo i found the re.split() function. BUT! here is the thing if i try to split it, it blows this on me
Traceback (most recent call last):
File "/root/Plocha/FlaskWebsiteHere/Mrcasa_na_C.py", line 34, in <module>
a = re.split(' / |</h3><p>|: 1\) ', something_paragraf[0])
File "/usr/lib/python3.5/re.py", line 203, in split
return _compile(pattern, flags).split(string, maxsplit)
TypeError: expected string or bytes-like object
now i tried to resolve this but nothing :/. Please help!
SnailingGoat

Maybe something like this?
import re
something_paragraf = soup.find_all("div", {"class":"ukolRes"})
# convert 'bs4.element.ResultSet' to `str`
html = ''.join([s.text for s in something_paragraf])
# modify this regex split to suit your needs
# split string on punctuation
multiple_strings = re.split('(?<=[.,!/?]) +',html)
print(multiple_strings)
output:
['Matematika /', 'splnit do 21.', 'září (čtvrtek)\nnaučit se bezpodmínečně všechny tři mocninné vzorce\núkol byly příklady v sešitě na rozložení a složení vzorců\n\nAnglický jazyk /'
...
'lepidlo Kores,', 'tužky,', 'propiska či pero (modré)\n']

Related

ib_insync "contract can't be hashed"

A process I have that's been running for a couple of years suddenly stopped working.
I avoided updating much in the way of python and packages to avoid that..
I've now updated ib_insync to the latest version, and no improvement. debugging a little gives me this:
the code
import ib_insync as ibis
ib = ibis.IB()
contract = ibis.Contract()
contract.secType = 'STK'
contract.currency = 'USD'
contract.exchange = 'SMART'
contract.localSymbol = 'AAPL'
ib.qualifyContracts(contract)
Result:
File "/Users/macuser/anaconda3/lib/python3.6/site-packages/ib_insync/client.py", line 244, in send
if field in empty:
File "/Users/macuser/anaconda3/lib/python3.6/site-packages/ib_insync/contract.py", line 153, in hash
raise ValueError(f'Contract {self} can't be hashed')
ValueError: Contract Contract(secType='STK', exchange='SMART', currency='USD', localSymbol='AAPL') can't be hashed
Exception ignored in: <bound method IB.del of <IB connected to 127.0.0.1:7497 clientId=6541>>
Traceback (most recent call last):
File "/Users/macuser/anaconda3/lib/python3.6/site-packages/ib_insync/ib.py", line 233, in del
File "/Users/macuser/anaconda3/lib/python3.6/site-packages/ib_insync/ib.py", line 281, in disconnect
File "/Users/macuser/anaconda3/lib/python3.6/logging/init.py", line 1306, in info
File "/Users/macuser/anaconda3/lib/python3.6/logging/init.py", line 1442, in _log
File "/Users/macuser/anaconda3/lib/python3.6/logging/init.py", line 1452, in handle
File "/Users/macuser/anaconda3/lib/python3.6/logging/init.py", line 1514, in callHandlers
File "/Users/macuser/anaconda3/lib/python3.6/logging/init.py", line 863, in handle
File "/Users/macuser/anaconda3/lib/python3.6/logging/init.py", line 1069, in emit
File "/Users/macuser/anaconda3/lib/python3.6/logging/init.py", line 1059, in _open
NameError: name 'open' is not defined
| => python --version
Python 3.6.4 :: Anaconda, Inc.
I am not OP, but have had a similar problem. I am attempting to send an order to IB using ib_insync.
contract = Stock('DKS','SMART','USD')
order = LimitOrder('SELL', 1, 1)
try:
ib.qualifyContracts(contract)
trade = ib.placeOrder(contract, order)
except Exception as e:
print(e)
Here is the exception that is returned:
Contract Stock(symbol='DKS', exchange='SMART', currency='USD') can't be hashed
I understand that lists and other mutable can't be hashed, but I don't understand why this wouldn't work. It pretty clearly follows the examples in the ib_insync docs.
FYI - I was able to resolve this issue by updating to the latest ib_insync version. Perhaps this will help you as well.

Writing to a .json file

I am currently trying to write 2 pieces of user inputted code to a .json file without clearing the existing data. I believe it is a problem with the logins.append as it says there is no such thing. What would i have to use?
I have been searching around trying to find different suffixes to the logins.
def i():
path_to_json = "./logins.json"
with open("logins.json", "r") as content:
logins = json.load(content)
with open('logins.json', 'a') as outfile:
username = str(input('New Username: '))
password = str(input('New Password: '))
logins.append({username:password})
I am getting the error:
Traceback (most recent call last):
File "<pyshell#5>", line 1, in <module>
i()
File "N:\NEA Computing\NEA code.py", line 188, in i
logins.append({username: password})
AttributeError: 'dict' object has no attribute 'append'
I am expecting it to add data to the .json file without deleting the other data however i am getting an error and nothing is being written to the .json file.
Try to open the file with 'w' option
def i():
path_to_json = "./logins.json"
with open("logins.json", "w") as content:
logins = json.load(content)
with open('logins.json', 'a') as outfile:
username = str(input('New Username: '))
password = str(input('New Password: '))
logins.append({username:password})
Maybe this is a misunderstanding of the question, but what is happening, as far as I can see, is that you use json.load, which seems to load an object literal like {key1: value1, key2: value2,...} and not an array literal like [value1, value2, ...]. Python converts the first one into a python dict, and a dictionary has no append function, only an array / a list has. You get your error and execution halts. When you have the content of logins.json as object {...}, it doesn't work for me but as array/list [...] it works

Read a file in R with mixed character encodings

I'm trying to read tables into R from HTML pages that are mostly encoded in UTF-8 (and declare <meta charset="utf-8">) but have some strings in some other encodings (I think Windows-1252 or ISO 8859-1). Here's an example. I want everything decoded properly into an R data frame. XML::readHTMLTable takes an encoding argument but doesn't seem to allow one to try multiple encodings.
So, in R, how can I try several encodings for each line of the input file? In Python 3, I'd do something like:
with open('file', 'rb') as o:
for line in o:
try:
line = line.decode('UTF-8')
except UnicodeDecodeError:
line = line.decode('Windows-1252')
There do seem to be R library functions for guessing character encodings, like stringi::stri_enc_detect, but when possible, it's probably better to use the simpler determinstic method of trying a fixed set of encodings in order. It looks like the best way to do this is to take advantage of the fact that when iconv fails to convert a string, it returns NA.
linewise.decode = function(path)
sapply(readLines(path), USE.NAMES = F, function(line) {
if (validUTF8(line))
return(line)
l2 = iconv(line, "Windows-1252", "UTF-8")
if (!is.na(l2))
return(l2)
l2 = iconv(line, "Shift-JIS", "UTF-8")
if (!is.na(l2))
return(l2)
stop("Encoding not detected")
})
If you create a test file with
$ python3 -c 'with open("inptest", "wb") as o: o.write(b"This line is ASCII\n" + "This line is UTF-8: I like π\n".encode("UTF-8") + "This line is Windows-1252: Müller\n".encode("Windows-1252") + "This line is Shift-JIS: ハローワールド\n".encode("Shift-JIS"))'
then linewise.decode("inptest") indeed returns
[1] "This line is ASCII"
[2] "This line is UTF-8: I like π"
[3] "This line is Windows-1252: Müller"
[4] "This line is Shift-JIS: ハローワールド"
To use linewise.decode with XML::readHTMLTable, just say something like XML::readHTMLTable(linewise.decode("http://example.com")).

How to load a json file with strings including double quotes (")

I've been given a load of JSON files which I'm trying to load into python 3.5
I've already had to do some clean up work, removing double backslashes and extra quotations, however I've run into an issue I don't know how to solve.
I'm running the following code:
with open(filepath,'r') as json_file:
reader = json_file.readlines()
for row in reader:
row = row.replace('\\', '')
row = row.replace('"{', '{')
row = row.replace('}"', '}')
response = json.loads(row)
for i in response:
responselist.append(i['ActionName'])
However it's throwing up the error:
JSONDecodeError: Expecting ',' delimiter: line 1 column 388833 (char 388832)
The part of the JSON that's causing the issue is the status text entry below:
"StatusId":8,
"StatusIdString":"UnknownServiceError",
"StatusText":"u003cCompany docTypeu003d"Mobile.Tile" statusIdu003d"421" statusTextu003d"Start time of 11/30/2015 12:15:00 PM is more than 5 minutes in the past relative to the current time of 12/1/2015 12:27:01 AM." copyrightu003d"Copyright Company Inc." versionNumberu003d"7.3" createdDateu003d"2015-12-01T00:27:01Z" responseIdu003d"e74710c0-dc7c-42db-b608-bf905d95d153" /u003e",
"ActionName":"GetTrafficTile"
I added the line breaks to illustrate my point, it looks like python is unhappy that the string contains double quotes.
I have a feeling this may be to do with my replacing '\ \' with '' messing with the unicode characters in the string. Is there any way to repair these nested strings? I don't mind if the StatusText field is deleted completely, all I'm after is a list of the ActionName fields.
EDIT:
I've hosted an example file here:
https://www.dropbox.com/s/1oanrneg3aqandz/2015-12-01T00%253A00%253A42.527Z_2015-12-01T00%253A01%253A17.478Z?dl=0
This is exactly as I received, before I've replaced the extra backslashes and quotations
Here is a pared down version of the sample with one bad entry
["{\"apiServerType\":0,\"RequestId\":\"52a65260-1637-4653-a496-7555a2386340\",\"StatusId\":0,\"StatusIdString\":\"Ok\",\"StatusText\":null,\"ActionName\":\"GetCameraImage\",\"Url\":\"http://mosi-prod.cloudapp.net/api/v1/GetCameraImage?AuthToken=vo*AB57XLptsKXf0AzKjf1MOgQ1hZ4BKipKgYl3uGew%7C&CameraId=13782\",\"Lat\":0.0,\"Lon\":0.0,\"iVendorId\":12561,\"iConsumerId\":2986897,\"iSliverId\":51846,\"UserId\":\"2986897\",\"HardwareId\":null,\"AuthToken\":\"vo*AB57XLptsKXf0AzKjf1MOgQ1hZ4BKipKgYl3uGew|\",\"RequestTime\":\"2015-12-01T00:00:42.5278699Z\",\"ResponseTime\":\"2015-12-01T00:01:02.5926127Z\",\"AppId\":null,\"HttpMethod\":\"GET\",\"RequestHeaders\":\"{\\\"Connection\\\":[\\\"keep-alive\\\"],\\\"Via\\\":[\\\"HTTP/1.1 nycnz01msp1ts10.wnsnet.attws.com\\\"],\\\"Accept\\\":[\\\"application/json\\\"],\\\"Accept-Encoding\\\":[\\\"gzip\\\",\\\"deflate\\\"],\\\"Accept-Language\\\":[\\\"en-us\\\"],\\\"Host\\\":[\\\"mosi-prod.cloudapp.net\\\"],\\\"User-Agent\\\":[\\\"Traffic/5.4.0\\\",\\\"CFNetwork/758.1.6\\\",\\\"Darwin/15.0.0\\\"]}\",\"RequestContentHeaders\":\"{}\",\"RequestContentBody\":\"\",\"ResponseBody\":null,\"ResponseContentHeaders\":\"{\\\"Content-Type\\\":[\\\"image/jpeg\\\"]}\",\"ResponseHeaders\":\"{}\",\"MiniProfilerJson\":null}"]
The problem is a little different than you think. Whatever program built these files used data that was already json-encoded and ended up double and even triple encoding some of the information. I peeled it apart in a shell session and got usable python data. You can (1) go dope-slap whoever wrote the program that built this steaming pile of... um... goodness? and (2) manually scan through and decode inner json strings.
I decoded the data and it was a list of strings, but those strings looked suspiciously like json
>>> data = json.load(open('test.json'))
>>> type(data)
<class 'list'>
>>> d0 = data[0]
>>> type(d0)
<class 'str'>
>>> d0[:70]
'{"apiServerType":0,"RequestId":"52a65260-1637-4653-a496-7555a2386340",'
Sure enough, I can decode it
>>> d0_1 = json.loads(d0)
>>> type(d0_1)
<class 'dict'>
>>> d0_1
{'ResponseBody': None, 'StatusText': None, 'AppId': None, 'ResponseTime': '2015-12-01T00:01:02.5926127Z', 'HardwareId': None, 'RequestTime': '2015-12-01T00:00:42.5278699Z', 'StatusId': 0, 'Lon': 0.0, 'Url': 'http://mosi-prod.cloudapp.net/api/v1/GetCameraImage?AuthToken=vo*AB57XLptsKXf0AzKjf1MOgQ1hZ4BKipKgYl3uGew%7C&CameraId=13782', 'RequestContentBody': '', 'RequestId': '52a65260-1637-4653-a496-7555a2386340', 'MiniProfilerJson': None, 'RequestContentHeaders': '{}', 'ActionName': 'GetCameraImage', 'StatusIdString': 'Ok', 'HttpMethod': 'GET', 'iSliverId': 51846, 'ResponseHeaders': '{}', 'ResponseContentHeaders': '{"Content-Type":["image/jpeg"]}', 'apiServerType': 0, 'AuthToken': 'vo*AB57XLptsKXf0AzKjf1MOgQ1hZ4BKipKgYl3uGew|', 'iConsumerId': 2986897, 'RequestHeaders': '{"Connection":["keep-alive"],"Via":["HTTP/1.1 nycnz01msp1ts10.wnsnet.attws.com"],"Accept":["application/json"],"Accept-Encoding":["gzip","deflate"],"Accept-Language":["en-us"],"Host":["mosi-prod.cloudapp.net"],"User-Agent":["Traffic/5.4.0","CFNetwork/758.1.6","Darwin/15.0.0"]}', 'iVendorId': 12561, 'Lat': 0.0, 'UserId': '2986897'}
Picking one of the entries, that looks like more json
>>> hdrs = d0_1['RequestHeaders']
>>> type(hdrs)
<class 'str'>
Yep, it decodes to what I want
>>> hdrs_0 = json.loads(hdrs)
>>> type(hdrs_0)
<class 'dict'>
>>>
>>> hdrs_0["Via"]
['HTTP/1.1 nycnz01msp1ts10.wnsnet.attws.com']
>>>
>>> type(hdrs_0["Via"])
<class 'list'>
Here you are :) :
responselist = []
with open('dataFile.json','r') as json_file:
reader = json_file.readlines()
for row in reader:
strActNm = 'ActionName":"'; lenActNm = len(strActNm)
actionAt = row.find(strActNm)
while actionAt > 0:
nxtQuotAt = row.find('"',actionAt+lenActNm+2)
responselist.append( row[actionAt-1: nxtQuotAt+1] )
actionAt = row.find('ActionName":"', nxtQuotAt)
print(responselist)
which gives:
>python3.6 -u "dataFile.py"
['"ActionName":"GetTrafficTile"']
>Exit code: 0
where dataFile.json is the file with the line you provided and dataFile.py the code provided above.
It's the hard tour, but if the files are in a bad format you have to find a way around and a simple pattern matching works in any case. For more complex cases you will need regex (regular expressions), but in this case a simple .find() is enough to do the job.
The code finds also multiple "actions" in the line (if the line would contain more than one action).
Here the result for the file you provided in your link while using following small modification of the code above:
responselist = []
with open('dataFile1.json','r') as json_file:
reader = json_file.readlines()
for row in reader:
strActNm='\\"ActionName\\":\\"'
# strActNm = 'ActionName":"'
lenActNm = len(strActNm)
actionAt = row.find(strActNm)
while actionAt > 0:
nxtQuotAt = row.find('"',actionAt+lenActNm+2)
responselist.append( row[actionAt: nxtQuotAt+1].replace('\\','') )
actionAt = row.find('ActionName":"', nxtQuotAt)
print(responselist)
gives:
>python3.6 -u "dataFile.py"
['"ActionName":"GetCameraImage"']
>Exit code: 0
where dataFile1.json is the file you provided in the link.

World of tanks Python list comparison from json

ok I am trying to create a definition which will read a list of IDS from an external Json file, Which it is doing. Its even putting the data into the database on load of the program, my issue is this. I cant seem to match the list IDs to a comparison. Here is my current code:
def check(account):
global ID_account
import json, httplib
if not hasattr(BigWorld, 'iddata'):
UID_DB = account['databaseID']
UID = ID_account
try:
conn = httplib.HTTPConnection('URL')
conn.request('GET', '/ids.json')
conn.sock.settimeout(2)
resp = conn.getresponse()
qresp = resp.read()
BigWorld.iddata = json.loads(qresp)
LOG_NOTE('[ABRO] Request of URL data successful.')
conn.close()
except:
LOG_NOTE('[ABRO] Http request to URL problem. Loading local data.')
if UID_DB is not None:
list = BigWorld.iddata["ids"]
#print (len(list) - 1)
for n in range(0, (len(list) - 1)):
#print UID_DB
#print list[n]
if UID_DB == list[n]:
#print '[ABRO] userid located:'
#print UID_DB
UID = UID_DB
else:
LOG_NOTE('[ABRO] userid not set.')
if 'databaseID' in account and account['databaseID'] != UID:
print '[ABRO] Account not active in database, game closing...... '
BigWorld.quit()
now my json file looks like this:
{
"ids":[
"1001583757",
"500687699",
"000000000"
]
}
now when I run this with all the commented out prints it seems to execute perfectly fine up till it tries to do the match inside the for loop. Even when the print shows UID_DB and list[n] being the same values, it does not set my variable, it doesn't post any errors, its just simply acting as if there was no match. am I possibly missing a loop break? here is the python log starting with the print of the length of the table print:
INFO: 2
INFO: 1001583757
INFO: 1001583757
INFO: 1001583757
INFO: 500687699
INFO: [ABRO] Account not active, game closing......
as you can see from the log, its never printing the User located print, so it is not matching them. its just continuing with the loop and using the default ID I defined above the definition. Anyone with an idea would definitely help me out as ive been poking and prodding this thing for 3 days now.
the answer to this was found by #VikasNehaOjha it was missing simply a conversion to match types before the match comparison I did this by adding in
list[n] = int(list[n])
that resolved my issue and it finally matched comparisons.