finBert Model - Config JSON File - Outputs Nothing - json

This is for running the ProsusAI finBert Model.
(https://github.com/ProsusAI/finBERT - GitHub)
(https://huggingface.co/ProsusAI/finbert - HuggingFace)
I downloaded the pytorch_model.bin file and used the config.json file that is shown on its GitHub repository.
There is no download button for the config.json file, so inside Python, i created a JSON File from Python and saved it as "config.json". I than placed both the pytorch_model.bin and the config.json files into a folder called "FinBertProsus". The actual Code for program i saved it as file name "FinBert Model", and this is outside of the folder, just placed where all my python programs are.
When i run the "FinBert Model" Program, it outputs nothing, i get nothing on my screen in the shell, it outputs blank. Why is it ouputting nothing, and how do you correct this ? ( i have tried passing a name/path key-value to the model in the config.json file, and also passing the architecture key-value pair in the config.json file, but i get the same result, nothing outputs from the program.)
Also would it be necessary that i have to download and install GitLFS for this type of Model ?
The code for the config.json file that is in the GitHub Repository, is different from the one that is in HuggingFace Repository. When running the huggingface repository, it is giving error for config file. I decided to run from the actual github repositry as that is more clean code for the config.json file and following the excercise, it is recommended to do this from the actual GitHub account. Below is the Code for the FinBert Model Program, and the JSON Config File.
FinBert Model Program below,
from transformers import BertTokenizer, BertForSequenceClassification
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') # bert-base-uncased
model = BertForSequenceClassification.from_pretrained('FinBertProsus/pytorch_model.bin', config = 'FinBertProsus/config.json', num_labels=3)
inputs = tokenizer('We had a great year', return_tensors='pt')
outputs = model(**inputs)
config.json file is below,
{
"attention_probs_dropout_prob": 0.1,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"type_vocab_size": 2,
"vocab_size": 30522
}

Related

Import pre-trained Deep Learning Models into Foundry Codeworkbooks

How do you import a h5 model locally from Foundry into code workbook?
I want to use the hugging face library as shown below, and in its documentation the from_pretrained method expects a URL path to the where the pretrained model lives.
I would ideally like to download the model onto my local machine, upload it onto Foundry, and have Foundry read in said model.
For reference I’m trying to do this on code workbook or code authoring. It looks like you can work directly with files from there, but I’ve read the documentation and the given example was for a CSV file whereas this model contains a variety of files like h5 and json format. Wondering how I can access these files and have them passsed into the from_pretrained method from the transformers package
Relevant links:
https://huggingface.co/transformers/quicktour.html
Pre-trained Model:
https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english/tree/main
Thank you!
I've gone ahead and added the transformers (hugging face) package onto the platform.
As for the uploading the package you can follow these steps:
Use your dataset with the model-related files as an input to your code workbook transform
Use pythons raw file access to access the contents of the dataset: https://docs.python.org/3/library/filesys.html
Use pythons built-in tempfile to build a folder and add the files from step 2, https://docs.python.org/3/library/tempfile.html#tempfile.mkdtemp , https://www.kite.com/python/answers/how-to-write-a-file-to-a-specific-directory-in-python
Pass in the tempfile (tempfile.mkdtemp() return the absolute path) to the from_pretrained method
import tempfile
def sample (dataset_with_model_folder_uploaded):
full_folder_path = tempfile.mkdtemp()
all_file_names = ['config.json', 'tf_model.h5', 'ETC.ot', ...]
for file_name in all_file_names:
with dataset_with_model_folder_uploaded.filesystem().open(file_name) as f:
pathOfFile = os.path.join(fullFolderPath, file_name)
newFile = open(pathOfFile, "w")
newFile.write(f.read())
newFile.close()
model = TF.DistilSequenceClassification.from_pretrained(full_folder_path)
tokenizer = TF.Tokenizer.from_pretrained(full_folder_path)
Thanks,

Why Python script run through batch file does not write to json file?

I have a Python script that does some web-scraping, then opens and dumps the parsed data into a JSON file in the same directory. Everything works, when the script is run manually through the CLI, but the data does not get written to the JSON file, when run from the batch file run by a task scheduler.
I have managed to show that all the data exists within the Python script, when run through the batch file. Somehow only part of the function that deals with the JSON file is not run.
Python script:
# Packages used:
import requests
from bs4 import BeautifulSoup
import smtplib
import time
from win10toast import ToastNotifier
import json
# Web Scraping...
my_json = {}
def function1():
# Web scraping for data...
json_function(data)
# Below is the function that is not functioning
def json_function(data):
my_json[time.strftime("%Y-%m-%d %H:%M")] = f"{data}"
with open ('json_file.json') as my_dict:
info = json.load(my_dict)
info.update(my_json)
with open('json_file.json','w') as my_dict:
json.dump(info,my_dict)
# A few other functions that work regardless...
# Call function
function1()
Batch file:
"C:\Users\...pythonw.exe" "C:\Users...script.pyw"
JSON file:
{"Key":"Value"}
Every file is in the same directory.
When run from the CLI, expected result occurs - key-value are appended to the JSON file. When run automatically (through batch and task scheduler), no visible errors, and all of the script, save for the json_function, run as expected.
Thank you to #PRMoureu for the answer, and #Mofi for a detailed explanation.
Answer is to ensure all files referenced have their full path referenced:
def json_function(data):
my_json[time.strftime("%Y-%m-%d %H:%M")] = f"{data}"
with open ('C:/.../json_file.json') as my_dict:
info = json.load(my_dict)
info.update(my_json)
with open('C:/.../json_file.json','w') as my_dict:
json.dump(info,my_dict)
Or, direct the Task Scheduler to the working directory to avoid the batch being run in the default, root directory.

Pandas module read_csv reads file within Eclipse+pydev while fail if I run standalone

I'm currently developing a GUI using Python and Tkinter.
On of the task is to open and read some *.csv files.
I order to perform this task I have written the following code:
ReadData=pd.read_csv(ResultFile,skipinitialspace=True).values
While I'm running the code within the IDE Eclipse+Pydev everything work fine. But as soon as I run my code form a Dos window, i.e. python MainGrap.py, the code bugs stating that the file doesn't exists???????
I first load the path to a file via self.Inp_Filename=askopenfilename() then I create a list of the folders by means of the following function:
def PathDisintegrator(Inp_File):
Folders = os.path.split(Inp_File)
LastFolder = Folders[1]
RootPath = Folders[0]
Dirs=[]
while not(LastFolder==''):
Dirs.insert(0,LastFolder)
Folders = os.path.split(RootPath)
LastFolder = Folders[1]
RootPath = Folders[0]
Dirs.insert(0,RootPath[:-1])
Dirs=Dirs[:-1]
return(Dirs)
Then I can recreate the full path to file via the following function:
def PathAndFile(Folders,File):
FileOut=''
for item in Folders:
FileOut=FileOut+os.sep+item
#FileOut=FileOut+r"\\"+item
FileOut=FileOut[1:]+os.sep+File
return(FileOut)
I have printed out the file path even within the parser of Pandas and it looks fine to me: D:\Abaqus_Runs\DOWLEX_PET_LAMINATE_PROTO_REFERENCE_SI_Version_2_Revision_2_MDangle0_Rate0_01_MOVING_NODE_out.csv
The problem here is that your python environment in eclipse can see the folder where your csv resides but the terminal one does not.
You can observe what the system paths are by doing:
In [331]:
import sys
sys.path
Out[331]:
['',
'C:\\WinPython-64bit-3.4.2.4\\python-3.4.2.amd64\\python34.zip',
'C:\\WinPython-64bit-3.4.2.4\\python-3.4.2.amd64\\DLLs',
'C:\\WinPython-64bit-3.4.2.4\\python-3.4.2.amd64\\lib',
'C:\\WinPython-64bit-3.4.2.4\\python-3.4.2.amd64',
'C:\\WinPython-64bit-3.4.2.4\\python-3.4.2.amd64\\lib\\site-packages',
'C:\\WinPython-64bit-3.4.2.4\\python-3.4.2.amd64\\lib\\site-packages\\win32',
'C:\\WinPython-64bit-3.4.2.4\\python-3.4.2.amd64\\lib\\site-packages\\win32\\lib',
'C:\\WinPython-64bit-3.4.2.4\\python-3.4.2.amd64\\lib\\site-packages\\Pythonwin',
'C:\\WinPython-64bit-3.4.2.4\\python-3.4.2.amd64\\lib\\site-packages\\IPython\\extensions']
So you need to provide a complete path or append the path to where the csv resides to your sys path. Note that backslashes must be escaped e.g. 'c:\\data\\my.csv' but if you use forward slashes then it works fine: e.g. 'c:/data/my.csv'

Is there a program that will generate a JSON config file based on a template and variables?

I have a project where I make JSON files periodically and it would be great to be able to template a JSON file and then just enter params and a complete JSON file is outputted. Imagine sass for JSON. Any programs like that out there?

get data from .csv file, analyze, produce output - python3

I am trying to complete an assignment in Python3. It is very similar to the pdf found here
I have a few questions on both the execution of how to get the information I need, and if possible, some code that could move me along. I am new to python. As right now from the code I have, I keep getting the error "directory not found" after running a function to try and read the data. I know the .csv file should be in the directory where I save it to in WingIDE, but I can't get it to work correctly.
My first question is after getting each line of the .csv file to read from my get_file_list, what is the best way to take each category and throw it into an efficiency equation?
Here is my get_data_list function:
def get_data_list(filename):
data_file = open(filename, "r")
data_list = [ ]
for line_str in data_file:
data_list.append(line_str.strip().split(','))
return data_list
when I run get_data_list("player_regular_season.csv") I get the following error:
builtins.IOError: [Errno 2] No such file or directory:'player_regular_season.csv'
For the first try, you can put the data file to the same directory with the Python program and launch it from the directory.
Try also a single purpose script to learn how to work with directories. Learn the functions from the standard doc 15.1.5. Files and Directories, namely os.getcwd(), os.chdir(path), and then 10.1. os.path — Common pathname manipulations, namely os.path.isfile(path).
But read also the doc of other functions in the documents to learn what is available.
When knowing how to work with filenames and paths, have a look at the 13.1. csv — CSV File Reading and Writing. Not to be scared of all the stuff, start from the end -- 13.1.5. Examples of using the csv module.