python pandas help from html reports

python pandas help from html reports - html

i have a bunch of html reports titled "ReportHistory-****" where **** refers to account numbers .. which are bascially reports that MT5 (Trading app) sends periodically to an FTP server
from bs4 import BeautifulSoup
my_html = open("Reports/ReportHistory-117339.html", "rb")
soup = BeautifulSoup(my_html, 'lxml')
table_tags = soup.find_all('table')
for table in table_tags:
print(table.text)
when i run this ... i get the following info printed.
Along with every single trade that occurred.. which is irrelevant for my purpose
Name:
Little John Norris
Account:
117339 (USD, M4Markets-MT5, demo, Hedge)
Company:
Trinota Markets Ltd
Date:
2023.01.12 03:53
Balance:
100 751.00
Free Margin:
100 751.00
Equity:
100 751.00
Total Net Profit:
751.00
Gross Profit:
39 704.50
Gross Loss:
-38 953.50
Balance Drawdown Absolute:
4 253.00
i want this info to be added to a dataframe so i can store it and pull it into another table
if i add the below code... it errors
data = []
df=pd.DataFrame(table.text),columns=['col1','col2'])
print(df)
it prints the same this traceback error
Traceback (most recent call last):
File "/Users/PycharmProjects/readhtml/reports.py", line 13, in <module>
df=pd.DataFrame(table.text)
File "/Users/PycharmProjects/readhtml/venv/lib/python3.9/site-packages/pandas/core/frame.py", line 780, in __init__
raise ValueError("DataFrame constructor not properly called!")
ValueError: DataFrame constructor not properly called!
followed by the regular table.text data and then
Process finished with exit code 1
my end goal is to be able to run the python code over each of the html reports and get the same relevant info from each of them associated with the account number
Is this possible?

Related

Error while loading a sentence transformer model

I'm trying to load transformer model from SentenceTransformer. Below is the code
# Now we create a SentenceTransformer model from scratch
word_emb = models.Transformer('paraphrase-mpnet-base-v2')
pooling = models.Pooling(word_emb.get_word_embedding_dimension())
model = SentenceTransformer(modules=[word_emb, pooling])
Below is the error
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_2948\3254427654.py in <module>
1 # Now we create a SentenceTransformer model from scratch
----> 2 word_emb = models.Transformer('paraphrase-mpnet-base-v2')
3 pooling = models.Pooling(word_emb.get_word_embedding_dimension())
4 model = SentenceTransformer(modules=[word_emb, pooling])
~\miniconda3\envs\atoti\lib\site-packages\sentence_transformers\models\Transformer.py in __init__(self, model_name_or_path, max_seq_length, model_args, cache_dir, tokenizer_args, do_lower_case, tokenizer_name_or_path)
27
28 config = AutoConfig.from_pretrained(model_name_or_path, **model_args, cache_dir=cache_dir)
---> 29 self._load_model(model_name_or_path, config, cache_dir)
30
31 self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name_or_path if tokenizer_name_or_path is not None else model_name_or_path, cache_dir=cache_dir, **tokenizer_args)
~\miniconda3\envs\atoti\lib\site-packages\sentence_transformers\models\Transformer.py in _load_model(self, model_name_or_path, config, cache_dir)
47 self._load_t5_model(model_name_or_path, config, cache_dir)
48 else:
---> 49 self.auto_model = AutoModel.from_pretrained(model_name_or_path, config=config, cache_dir=cache_dir)
50
51 def _load_t5_model(self, model_name_or_path, config, cache_dir):
~\miniconda3\envs\atoti\lib\site-packages\transformers\models\auto\auto_factory.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
445 elif type(config) in cls._model_mapping.keys():
446 model_class = _get_model_class(config, cls._model_mapping)
--> 447 return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)
448 raise ValueError(
449 f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
~\miniconda3\envs\atoti\lib\site-packages\transformers\modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
1310 elif os.path.join(pretrained_model_name_or_path, FLAX_WEIGHTS_NAME):
1311 raise EnvironmentError(
-> 1312 f"Error no file named {WEIGHTS_NAME} found in directory {pretrained_model_name_or_path} but "
1313 "there is a file for Flax weights. Use `from_flax=True` to load this model from those "
1314 "weights."
OSError: Error no file named pytorch_model.bin found in directory paraphrase-mpnet-base-v2 but there is a file for Flax weights. Use `from_flax=True` to load this model from those weights.
I'm using below versions
transformers==4.16.2
torch==1.11.0+cu113
torchaudio==0.11.0+cu113
torchvision==0.12.0+cu113
sentence-transformers==2.2.0
faiss-cpu==1.7.2
sentencepiece==0.1.96
It's been 2 months i ran this. All of a sudden, it's returning an error. I'm using FAISS-CPU as well.

The error is telling you that "I can't find the weights of the model you are trying to load."
Based on the error trace, I guess you are using models object from Sentence-Transformers library (correct me if I am wrong). One thing to note is that Sentence-Transformers only has the following paraphrase models as its pretrained models:
paraphrase-multilingual-mpnet-base-v2
paraphrase-albert-small-v2
paraphrase-multilingual-MiniLM-L12-v2
paraphrase-MiniLM-L3-v2
hence the one you wanted to load is not one of Sentence-Transformers pretrained models.
That brings me to think that you are trying to load a model from your local machine.
I would suggest you to create a Sentence-Transformers model like this:
from sentence_transformers import SentenceTransformer
model_path_or_name = "path/to/model" # A folder that contains model config files, including pytorch_model.bin
model = SentenceTransformer(model_path_or_name)
There was also a possibility that the pytorch_model.bin file was downloaded with another filename, as mentioned in the SO thread here.
Let me know if this solves your problem. Cheers.

google spreadsheet CellNotFound Exception using python issue

I am building Discord bot which ask user for their name and Points. My spreadsheet has two columns with first raw as header (Name, Points). my code take the name of the player and search the spreadsheet if found update the point. This works perfect when the player name already found but i get cellnotfound error when the name is not in the google sheet.
i have already tried solution in other forum but non are working like
if len(cells) > 0 and except gspread.exceptions.CellNotFound: and except gspread.CellNotFound:
import gspread
from oauth2client.service_account import ServiceAccountCredentials
from pprint import pprint
from googleapiclient import discovery
# SET UP GSHEETS DESTINATION
scope =["https://spreadsheets.google.com/feeds",'https://www.googleapis.com/auth/spreadsheets',"https://www.googleapis.com/auth/drive.file","https://www.googleapis.com/auth/drive"]
creds = ServiceAccountCredentials.from_json_keyfile_name("generated.json", scope)
client1 = gspread.authorize(creds)
content = "22"
if content.isdigit():
#open spreadsheet
sh = client1.open("test")
worksheet = sh.sheet1
nameof= 'player4'
#find data with playername
cells = worksheet.find(nameof)
if cells != []:
#capture player name data column and row number
#print("found at R%s C%s" %(cells.row, cells.col))
name_row_number = ("%s"%(cells.row))
name_cell_number = ("%s"%(cells.col))
old_points_cell_number = int(name_cell_number)+1
#print(old_points_cell_number)
oldscore = worksheet.cell(name_row_number, old_points_cell_number).value
#print(oldscore)
worksheet.update_cell(name_row_number, old_points_cell_number, content)
else:
print("name not found")
[Screenshot of my google sheet][1]
[1]: https://i.stack.imgur.com/iw6O5.png
Below is the error message i get
Traceback (most recent call last):
File "C:\Python39\lib\site-packages\gspread\models.py", line 1799, in find
return self._finder(finditem, query, in_row, in_column)
File "C:\Python39\lib\site-packages\gspread\models.py", line 1761, in _finder
return func(match, cells)
File "C:\Python39\lib\site-packages\gspread\utils.py", line 97, in finditem
return next((item for item in seq if func(item)))
StopIteration
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\abc\Desktop\Discord_Python\ReadExcel.py", line 38, in <module>
cells = worksheet.find(nameof)
File "C:\Python39\lib\site-packages\gspread\models.py", line 1801, in find
raise CellNotFound(query)
gspread.exceptions.CellNotFound: player4

ESP32: Odd behavior with urequests - object with buffer protocol required

I am trying to wear out a battery to see how long it powers my ESP32. To do this, I wrote a simple program that writes "Still alive" to a website once a minute. However, the program crashes every time it has written the string successfully to the web file. I can't figure out why it works nine times, then fails. Here is the code:
from time import sleep_ms, ticks_ms
import network
import socket
import urequests
import machine
import json
SSID="my_ssid"
PASSWORD="my_password"
port=100
wlan=None
s=None
def connectWifi(ssid,passwd): #function to connect to the Web
global wlan #declare a WLAN object
wlan=network.WLAN(network.STA_IF) #create a wlan object
wlan.active(True) #Activate the network interface
wlan.disconnect() #Disconnect the last connected WiFi
wlan.connect(ssid,passwd) #connect wifi
while(wlan.ifconfig()[0]=='0.0.0.0'): #wait for connection
sleep_ms(1)
sleep_ms(1000) #hold on for 1 second
sendmessage("Connected to WLAN")
sleep_ms(1000) #hold on for 1 second
return True
def sendmessage(myMessage):
url = "http://www.sabulo.com/stillalive.php"
headers = {'content-type': 'application/json'}
data = {'message': myMessage}
jsonObj = json.dumps(data)
resp = urequests.post(url, data=jsonObj, headers=headers)
return True
def main():
connectWifi(SSID,PASSWORD)
hitcount = 0
while True:
sendmessage("Still alive " + str(hitcount))
print("Still alive " + str(hitcount))
hitcount = hitcount + 1
sleep_ms(1000)
main()
Then I get this:
>>>
>>> I (5906) phy: phy_version: 4007, 9c6b43b, Jan 11 2019, 16:45:07, 0, 0
Still alive 0
Still alive 1
Still alive 2
Still alive 3
Still alive 4
Still alive 5
Still alive 6
Still alive 7
Still alive 8
╝Traceback (most recent call last):
File "<stdin>", line 45, in <module>
File "<stdin>", line 40, in main
File "<stdin>", line 33, in sendmessage
File "urequests.py", line 111, in post
File "urequests.py", line 56, in request
OSError: 23
╝>
MicroPython v1.10-98-g4daee3170 on 2019-02-14; ESP32 module with ESP32
Type "help()" for more information.
I am using the same web upload code piece in various apps and it never gives me this error.
Any help gratefully acknowledged :)

How to create a net which takes unlabeled "dummy data" as input?

I currently work myself through the caffe/examples/ to learn more about caffe/pycaffe.
In the 02-fine-tuning.ipynb-notebook there is a codecell which shows how to create a caffenet which takes unlabeled "dummmy data" as input, allowing us to set its input images externally. The notebook can be found here:
https://github.com/BVLC/caffe/blob/master/examples/02-fine-tuning.ipynb
There is a given code-cell, which throws an error:
dummy_data = L.DummyData(shape=dict(dim=[1, 3, 227, 227]))
imagenet_net_filename = caffenet(data=dummy_data, train=False)
imagenet_net = caffe.Net(imagenet_net_filename, weights, caffe.TEST)
error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-6-9f0ecb4d95e6> in <module>()
1 dummy_data = L.DummyData(shape=dict(dim=[1, 3, 227, 227]))
----> 2 imagenet_net_filename = caffenet(data=dummy_data, train=False)
3 imagenet_net = caffe.Net(imagenet_net_filename, weights, caffe.TEST)
<ipython-input-5-53badbea969e> in caffenet(data, label, train, num_classes, classifier_name, learn_all)
68 # write the net to a temporary file and return its filename
69 with tempfile.NamedTemporaryFile(delete=False) as f:
---> 70 f.write(str(n.to_proto()))
71 return f.name
~/anaconda3/envs/testcaffegpu/lib/python3.6/tempfile.py in func_wrapper(*args, **kwargs)
481 #_functools.wraps(func)
482 def func_wrapper(*args, **kwargs):
--> 483 return func(*args, **kwargs)
484 # Avoid closing the file as long as the wrapper is alive,
485 # see issue #18879.
TypeError: a bytes-like object is required, not 'str'
Anyone knows how to do this right?

tempfile.NamedTemporaryFile() opens a file in binary mode ('w+b') by default. Since you are using Python3.x, string is not the same type as for Python 2.x, hence providing a string as input to f.write() results in error since it expects bytes. Overriding the binary mode should avoid this error.
Replace
with tempfile.NamedTemporaryFile(delete=False) as f:
with
with tempfile.NamedTemporaryFile(delete=False, mode='w') as f:
This has been explained in a previous post:
TypeError: 'str' does not support the buffer interface

Error while exporting a dask dataframe to csv

My dask dataframe has about 120 mm rows and 4 columns:
df_final.dtypes
cust_id int64
score float64
total_qty float64
update_score float64
dtype: object
and I'm doing this operation on jupyter notebooks connected to linux machine :
%time df_final.to_csv('/path/claritin-files-*.csv')
and it throws up this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-24-46468ae45023> in <module>()
----> 1 get_ipython().magic(u"time df_final.to_csv('path/claritin-files-*.csv')")
/home/mspra/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in magic(self, arg_s)
2334 magic_name, _, magic_arg_s = arg_s.partition(' ')
2335 magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2336 return self.run_line_magic(magic_name, magic_arg_s)
2337
2338 #-------------------------------------------------------------------------
/home/mspra/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_line_magic(self, magic_name, line)
2255 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
2256 with self.builtin_trap:
-> 2257 result = fn(*args,**kwargs)
2258 return result
2259
/home/mspra/anaconda2/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in time(self, line, cell, local_ns)
/home/mspra/anaconda2/lib/python2.7/site-packages/IPython/core/magic.pyc in <lambda>(f, *a, **k)
191 **# but it's overkill for just that one bit of state.**
192 def magic_deco(arg):
--> 193 call = lambda f, *a, **k: f(*a, **k)
194
195 if callable(arg):
/home/mspra/anaconda2/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in time(self, line, cell, local_ns)
1161 if mode=='eval':
1162 st = clock2()
-> 1163 out = eval(code, glob, local_ns)
1164 end = clock2()
1165 else:
<timed eval> in <module>()
/home/mspra/anaconda2/lib/python2.7/site-packages/dask/dataframe/core.pyc in to_csv(self, filename, **kwargs)
936 """ See dd.to_csv docstring for more information """
937 from .io import to_csv
--> 938 return to_csv(self, filename, **kwargs)
939
940 def to_delayed(self):
/home/mspra/anaconda2/lib/python2.7/site-packages/dask/dataframe/io/csv.pyc in to_csv(df, filename, name_function, compression, compute, get, **kwargs)
411 if compute:
412 from dask import compute
--> 413 compute(*values, get=get)
414 else:
415 return values
/home/mspra/anaconda2/lib/python2.7/site-packages/dask/base.pyc in compute(*args, **kwargs)
177 dsk = merge(var.dask for var in variables)
178 keys = [var._keys() for var in variables]
--> 179 results = get(dsk, keys, **kwargs)
180
181 results_iter = iter(results)
/home/mspra/anaconda2/lib/python2.7/site-packages/dask/threaded.pyc in get(dsk, result, cache, num_workers, **kwargs)
74 results = get_async(pool.apply_async, len(pool._pool), dsk, result,
75 cache=cache, get_id=_thread_get_id,
---> 76 **kwargs)
77
78 # Cleanup pools associated to dead threads
/home/mspra/anaconda2/lib/python2.7/site-packages/dask/async.pyc in get_async(apply_async, num_workers, dsk, result, cache, get_id, raise_on_exception, rerun_exceptions_locally, callbacks, dumps, loads, **kwargs)
491 _execute_task(task, data) # Re-execute locally
492 else:
--> 493 raise(remote_exception(res, tb))
494 state['cache'][key] = res
495 finish_task(dsk, key, state, results, keyorder.get)
**ValueError: invalid literal for long() with base 10: 'total_qty'**
Traceback
---------
File "/home/mspra/anaconda2/lib/python2.7/site-packages/dask/async.py", line 268, in execute_task
result = _execute_task(task, data)
File "/home/mspra/anaconda2/lib/python2.7/site-packages/dask/async.py", line 249, in _execute_task
return func(*args2)
File "/home/mspra/anaconda2/lib/python2.7/site-packages/dask/dataframe/io/csv.py", line 55, in pandas_read_text
coerce_dtypes(df, dtypes)
File "/home/mspra/anaconda2/lib/python2.7/site-packages/dask/dataframe/io/csv.py", line 83, in coerce_dtypes
df[c] = df[c].astype(dtypes[c])
File "/home/mspra/anaconda2/lib/python2.7/site-packages/pandas/core/generic.py", line 3054, in astype
raise_on_error=raise_on_error, **kwargs)
File "/home/mspra/anaconda2/lib/python2.7/site-packages/pandas/core/internals.py", line 3189, in astype
return self.apply('astype', dtype=dtype, **kwargs)
File "/home/mspra/anaconda2/lib/python2.7/site-packages/pandas/core/internals.py", line 3056, in apply
applied = getattr(b, f)(**kwargs)
File "/home/mspra/anaconda2/lib/python2.7/site-packages/pandas/core/internals.py", line 461, in astype
values=values, **kwargs)
File "/home/mspra/anaconda2/lib/python2.7/site-packages/pandas/core/internals.py", line 504, in _astype
values = _astype_nansafe(values.ravel(), dtype, copy=True)
File "/home/mspra/anaconda2/lib/python2.7/site-packages/pandas/types/cast.py", line 534, in _astype_nansafe
return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
File "pandas/lib.pyx", line 980, in pandas.lib.astype_intsafe (pandas/lib.c:17409)
File "pandas/src/util.pxd", line 93, in util.set_value_at_unsafe (pandas/lib.c:72777)
I have a couple of questions:
1) First of all this export was working fine on Friday, it spit out 100 csv files ( since it has 100 partitions), which I later aggregated. So what is wrong today -- anything from the error log?
2) May be this question is for the creators of this package, what is the most time-efficient way to get a csv extract out of a dask dataframe of this size, since it was taking about 1.5 to 2 hrs, the last time it was working.
I'm not using dask distributed and this is on single core of a linux cluster.

This error likely has little to do with to_csv and more to do with something else in your computation. The call to df.to_csv was just the first time you forced the computation to roll through all of the data.
Given the error I actually suspect that this is failing in read_csv. Dask.dataframe read the first few hundred kilobytes of your first file to guess at the datatypes, but it seems to have guessed incorrectly. You might want to try specifying dtypes explicitly in the read_csv call.
In regards to the second question about writing to CSV quickly, my first answer would be "use Parquet or HDF5 instead". They're much faster and more accurate in almost every respect.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

python pandas help from html reports - html

Related

Error while loading a sentence transformer model

google spreadsheet CellNotFound Exception using python issue

ESP32: Odd behavior with urequests - object with buffer protocol required

How to create a net which takes unlabeled "dummy data" as input?

Error while exporting a dask dataframe to csv

Categories

Resources