im trying to stop empty csv files causing errors in my simple sampling program, just grab 2 values from each .csv file in folder,
i have null check, which now catches it, but im unsure how to re-structure my code so it skips file in array to next one, any assistance greatly welcomed,
foreach (string name in array1)
{
// sampling engine loop here, take first line only, first column DateTimeStamp and second is Voltage
Console.Write("\r Number of File currently being processed = {0,6}", i);
i++;
var reader = new StreamReader(File.OpenRead(name)); // Static for testing only, to be replaced by file filter code
var line = reader.ReadLine();
if (line == null)
{
Console.WriteLine("Null value detected");
Console.ReadKey();
break;
}
var values = line.Split(',');
reader.ReadLine();
if (values.Length == 89)
{
using (StreamWriter outfile = new StreamWriter(#"C:\\SampledFileResults.txt", true))
{
string content = "";
{
content = content + values[0] + ",";
content = content + values[9] + ",";
}
outfile.WriteLine(content);
Console.WriteLine(content);
}
}
}
Console.WriteLine("SAMPLING COMPLETED");
Related
I made a formula to extract some Wikipedia data in Google Seets which works fine. Here is the formula:
=regexreplace(join("",flatten(IMPORTXML(D2,".//p[preceding-sibling::h2[1][contains(., 'Geography')]]"))),"\[[^\]]+\]","")&char(10)&char(10)&iferror(regexreplace(join("",flatten(IMPORTXML(D2,".//p[preceding-sibling::h2[1][contains(., 'Education')]]"))),"\[[^\]]+\]",""))
Where D2 is a URL like https://en.wikipedia.org/wiki/Abbeville,_Alabama
This extracts some Geography and Education data from the Wikipedia page. Trouble is that importxml only runs a few times before it dies due to quota.
So I thought maybe better to use Apps Script where there are much higher limits on fetching and parsing. I could not see a good way however of using Xpath in Apps Script. Older posts on the web discuss using a deprecated service called Xml but it seems to no longer work. There is a Service called XmlService which looks like it may do the job but you can't just plug in an Xpath. It looks like a lot of sweating to get to the result. Any solutions out there where you can just plug in Xpath?
Here is an alternative solution I actually do in a case like this.
I have used XmlService but only for parsing the content, not for using Xpath. This makes use of the element tags and so far pretty consistent on my tests. Although, it might need tweaks when certain tags are in the result and you might have to include them into the exclusion condition.
Tested the code below in both links:
https://en.wikipedia.org/wiki/Abbeville,_Alabama#Geography
https://en.wikipedia.org/wiki/Montgomery,_Alabama#Education
My test shows that the formula above used did not return the proper output from the 2nd link while the code does. (Maybe because it was too long)
Code:
function getGeoAndEdu(path) {
var data = UrlFetchApp.fetch(path).getContentText();
// wikipedia is divided into sections, if output is cut, increase the number
var regex = /.{1,100000}/g;
var results = [];
// flag to determine if matches should be added
var foundFlag = false;
do {
m = regex.exec(data);
if (foundFlag) {
// if another header is found during generation of data, stop appending the matches
if (matchTag(m[0], "<h2>"))
foundFlag = false;
// exclude tables, sub-headers and divs containing image description
else if(matchTag(m[0], "<div") || matchTag(m[0], "<h3") ||
matchTag(m[0], "<td") || matchTag(m[0], "<th"))
continue;
else
results.push(m[0]);
}
// start capturing if either IDs are found
if (m != null && (matchTag(m[0], "id=\"Geography\"") ||
matchTag(m[0], "id=\"Education\""))) {
foundFlag = true;
}
} while (m);
var output = results.map(function (str) {
// clean tags for XmlService
str = str.replace(/<[^>]*>/g, '').trim();
decode = XmlService.parse('<d>' + str + '</d>')
// convert html entity codes (e.g. ) to text
return decode.getRootElement().getText();
// filter blank results due to cleaning and empty sections
// separate data and remove citations before returning output
}).filter(result => result.trim().length > 1).join("\n").replace(/\[\d+\]/g, '');
return output;
}
// check if tag is found in string
function matchTag(string, tag) {
var regex = RegExp(tag);
return string.match(regex) && string.match(regex)[0] == tag;
}
Output:
Difference:
Formula ending output
Script ending output
Education ending in wikipedia
Note:
You still have quota when using UrlFetchApp but should be better than IMPORTXML's limit depending on the type of your account.
Reference:
Apps Script Quotas
Sorry I got very busy this week so I didn't reply. I took a look at your answer which seems to work fine, but it was quite code heavy. I wanted something I would understand so I coded my own solution. not that mine is any simpler. It's just my own code so it's easier for me to follow:
function getTextBetweenTags(html, paramatersInFirstTag, paramatersInLastTag) { //finds text values between 2 tags and removes internal tags to leave plain text.
//eg getTextBetweenTags(html,[['class="mw-headline"'],['id="Geography"']],[['class="wikitable mw-collapsible mw-made-collapsible"']])
// **Note: you may want to replace &#number; with ascII number
var openingTagPos = null;
var closingTagPos = null;
var previousChar = '';
var readingTag = false;
var newTag = '';
var tagEnd = false;
var regexFirstTagParams = [];
var regexLastTagParams = [];
//prepare regexes to test for parameters in opening and closing tags. put regexes in arrays so each condition can be tested separately
for (var i in paramatersInFirstTag) {
regexFirstTagParams.push(new RegExp(escapeRegex(paramatersInFirstTag[i][0])))
}
for (var i in paramatersInLastTag) {
regexLastTagParams.push(new RegExp(escapeRegex(paramatersInLastTag[i][0])))
}
var startTagIndex = null;
var endTagIndex = null;
var matches = 0;
for (var i = 0; i < html.length - 1; i++) {
var nextChar = html.substr(i, 1);
if (nextChar == '<' && previousChar != '\\') {
readingTag = true;
}
if (nextChar == '>' && previousChar != '\\') { //if end of tag found, check tag matches start or end tag
readingTag = false;
newTag += nextChar;
//test for firstTag
if (startTagIndex == null) {
var alltestsPass = true;
for (var j in regexFirstTagParams) {
if (!regexFirstTagParams[j].test(newTag)) alltestsPass = false;
}
if (alltestsPass) {
startTagIndex = i + 1;
//console.log('Start Tag',startTagIndex)
matches++;
}
}
//test for lastTag
else if (startTagIndex != null) {
var alltestsPass = true;
for (var j in regexLastTagParams) {
if (!regexLastTagParams[j].test(newTag)) alltestsPass = false;
}
if (alltestsPass) {
endTagIndex = i + 1;
matches++;
}
}
if(startTagIndex && endTagIndex) break;
newTag = '';
}
if (readingTag) newTag += nextChar;
previousChar = nextChar;
}
if (matches < 2) return 'No matches';
else return html.substring(startTagIndex, endTagIndex).replace(/<[^>]+>/g, '');
}
function escapeRegex(string) {
if (string == null) return string;
return string.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
}
My function requires an array of attributes for the start tag and an array of attributes for the end tag. It gets any text in between and removes any tags found inbetween. One issue I also noticed was there were often special characters (eg ) so they need to be replaced. I did that outside the scope of the function above.
The function could be easily improved to check the tag type (eg h2), but it wasn't necessary for the wikipedia case.
Here is a function where I called the above function. the html variable is just the result of UrlFetchApp.fetch('some wikipedia city url').getContextText();
function getWikiTexts(html) {
var geography = getTextBetweenTags(html, [['class="mw-headline"'], ['id="Geography']], [['class="mw-headline"']]);
var economy = getTextBetweenTags(html, 'span', [['class="mw-headline"'], ['id="Economy']], 'span', [['class="mw-headline"']])
var education = getTextBetweenTags(html, 'span', [['class="mw-headline"'], ['id="Education']], 'span', [['class="mw-headline"']])
var returnString = '';
if (geography != 'No matches' && !/Wikipedia/.test(geography)) returnString += geography + '\n';
if (economy != 'No matches' && !/Wikipedia/.test(economy)) returnString += economy + '\n';
if (education != 'No matches' && !/Wikipedia/.test(education)) returnString += education + '\n';
return returnString
}
Thanks for posting your answer.
Iam new to SSIS , Iam facing the below issue while parsing a text file which contains the below sample data
Below is the requirement
-> Need to Capture the number after IH1(454756567) and insert into one column as
InvoiceNumber
-> Need to insert the data between ABCD1234 to ABCD2345 into another column as
TotalRecord .
Many thanks for the help .
ABCD1234
IH1 454756567 686575634
IP2 HJKY TXRT
IBG 23455GHK
ABCD2345
IH1 689343256 686575634
IP2 HJKY TXRT
IBG 23455GHK
ABCD5678
This is the script component to process the entire file. You need to create your output and they are currently being processed as strings.
This assumes your file format is consistent. If you don't have 2 columns in IH1 and IP2 ALL the time. I would recommend a for loop from 1 to len -1 to process. And send the records to their own output.
public string recordID = String.Empty;
public override void CreateNewOutputRows()
{
string filePath = ""; //put your filepath here
using (System.IO.StreamReader sr = new System.IO.StreamReader(filePath))
{
while (!sr.EndOfStream)
{
string line = sr.ReadLine();
if (line.Substring(0, 4) == "ABCD") //Anything that identifies the start of a new record
// line.Split(' ').Length == 1 also meets your criteria.
{
recordID = line;
Output0Buffer.AddRow();
Output0Buffer.RecordID = line;
}
string[] cols = line.Split(' ');
switch (cols[0])
{
case "IH1":
Output0Buffer.InvoiceNumber = cols[1];
Output0Buffer.WhatEverTheSecondColumnIs = cols[2];
break;
case "IP2":
Output0Buffer.ThisRow = cols[1];
Output0Buffer.ThisRow2 = cols[2];
break;
case "IBG":
Output0Buffer.Whatever = cols[1];
break;
}
}
}
}
You'll need to do this in a script component.
The problem I'm running into is that I'm hitting a certain quota when processing my spread sheets. I process a bunch of spreadsheets each day and when I added in a new system that sends my google script more spreadsheets to process, I get the error:
Limit Exceeded DriveApp
The line that it ends on is always orderedCsv.getBlob().getDataAsString(); where orderedCsv is the current spreadsheet.
My questions are
1. Which quota could i be hitting?
2. How can I check my current quota usage?
I think it could be Properties read/write over exceeding since I import the original data which could be anywhere from 3000-9000 lines of data.
The error transcript it gives me is:
Error Transcript Pastebin
function ps_csvsToSheet ( currSheet, sheetCsvs, csvDict, sheetN, sheetOrderIndex){
// import csvs into the sheet with formatting
lib_formatSheet(currSheet);
var row = 39;
var orderedCsv;
// loop for importing CSVs into one sheet in the order we want~~~~~~
for (var i = 0; i < ps_statOrdering.length; i++) {
// loop through all the sheets stored in a dictionary we created before
for (var j = 0; j < sheetCsvs.length; j++) {
var sheetName = sheetCsvs[j].getName();
// additional test to ensure Draw chart and not DrawCall
if ( ps_statOrdering[i] == 'Draw') {
if ( sheetName.indexOf(ps_statOrdering[i]) !== -1 && sheetName.indexOf('DrawCalls') == -1) {
orderedCsv = sheetCsvs[j];
break;
}
} else if ( sheetName.indexOf(ps_statOrdering[i]) !== -1) {
orderedCsv = sheetCsvs[j];
break;
}
}
try{
// import the csvs for spreadsheet
var strData = orderedCsv.getBlob().getDataAsString(); //**********[Line it ends on]***********
var importedData = lib_importCSV(row+1, 1, strData, currSheet);
}
catch(error) {
Logger.log("Catch Error : " + error);
return
}
// make formatting [][] for the importedData. Here we are working off
// of pre-knowledge of what is expected
var nRows = importedData['rows'];
var nCols = importedData['cols'];
var c;
var weightArr = new Array(nRows);
var numFormatArr = new Array(nRows);
for (var r = 0; r < nRows; r++) {
weightArr[r] = new Array(nCols);
numFormatArr[r] = new Array(nCols);
if (r == 0) {
c = nCols;
while(c--) {
weightArr[r][c] = "bold";
numFormatArr[r][c] = '';
}
} else {
c = nCols;
while(c--) {
weightArr[r][c] = "normal";
numFormatArr[r][c] = '0.00';
}
weightArr[r][0] = "bold";
numFormatArr[r][0] = '';
if( sheetOrderIndex !== -1) {
numFormatArr[r][0] = 'MMM.dd';
}
}
}
importedData['range'].setFontWeights(weightArr)
.setNumberFormats(numFormatArr);
//Create the header of the sheet
lib_inputSheetHeader(currSheet, row, nCols, (sheetN + " " + ps_statOrdering[i]
+ " Averages"), ps_profileColors[0]) ;
// insert appropriate graph
var key = ps_statOrdering[i];
if( sheetOrderIndex !== -1) {
// this is a setting trend sheet, line chart
lib_makeLineChart(importedData['range'], ps_statLocDict[key][0], ps_statLocDict[key][1],
(sheetN + " " + ps_statOrdering[i] ), currSheet,
ps_statVRange[key][0], ps_statVRange[key][1], ps_statAxisDict[key]);
} else {
// this is a map sheet, bar chart
// debugPrint(importedData['range'].getValues().toString());
lib_makeBarChart(importedData['range'], ps_statLocDict[key][0], ps_statLocDict[key][1],
(sheetN + " " + ps_statOrdering[i] ), currSheet,
ps_statVRange[key][0], ps_statVRange[key][1], ps_statAxisDict[key]);
}
row += importedData['rows'] +2;
} // for loop close, import csv ~~~~
sleep(1000);
SpreadsheetApp.flush();
}
So what i did was i off loaded a lot of calculations to python with pandas. That way I could import a data sheet that is already pre formatted and run a couple of operations on it in google to save execution time.
The code i used to make this work is a bit large, because of specific data operation i had to do. Here is a quick summary of the code done in python:
import pandas as pd
import numpy as np #used in case of needing np.NAN in our data
class DataProcessing():
def __init__(self):
rawData = pd.DataFrame.from_csv( **<enter path to csv>** )
#From here i would run operations on the dataframe named rawdata
#until the data frame matched what i needed it to look like.
#PyCharm is a python IDE that can help you visualize the data frame
#through break points.
#After im done modifying my dataframe i sent it to my google drive.
#If you download google drive to your PC you can send it to a folder
#in your PC that will auto sync files to your google drive.
rawData.to_csv(os.path.join(**<GoogleFolder Path>**, csvName))
Learning pandas is a little tricky at the start but here is a resource that helped me modify my data right.
https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf
Help this helps!
I have a question regarding to my project which is how to read a string in AS3.
Actually, I have an text file named test.txt. For instance:
It consists of:
Sun,Mon,Tue,Wed,Thu,Fri,Sat
and then I want to put all of them into an array and then a string to show them in the dynamic text Box called text_txt:
var myTextLoader:URLLoader = new URLLoader();
myTextLoader.addEventListener(Event.COMPLETE, onLoaded);
function onLoaded(e:Event):void
{
var days:Array = e.target.data.split(/\n/);
var str:String;
stage.addEventListener(MouseEvent.CLICK, arrayToString);
function arrayToString(e:MouseEvent):void
{
for (var i=0; i<days.length; i++)
{
str = days.join("");
text_txt.text = str + "\n" + ";"; //it does not work here
}
}
}
myTextLoader.load(new URLRequest("test.txt"));
BUT IT DOES NOT show them in different line and then put a ";" at the end of each line !
I can make it to show them in different line, but I need to put them in different line in txt file and also I still do not get the ";" at the end of each line unless put it in the next file also at the end of each line.
And then I want to read the string and show an object from my library based on each word or line. for example:
//I do not know how to write it or do we have a function to read a string and devide it to the words after each space or line
if (str.string="sun"){
show(obj01);
}
if (str.string="mon"){
show(obj02);
}
I hope I can get the answer for this question.
Please inform me if you can not get the concept of the last part. I will try to explain it more until you can help me.
Thanks in advance
you must enable multiline ability for your TextField (if did not)
adobe As3 DOC :
join() Converts the elements in an array to strings, inserts the
specified separator between the elements, concatenates them, and
returns the resulting string. A nested array is always separated by a
comma (,), not by the separator passed to the join() method.
so str = days.join(""); converts the Array to a single string, and as your demand ( parameter passed to join is empty "") there is no any thing between fetched lines. and text_txt.text = str + "\n" + ";"; only put a new line at the end of the text once.
var myTextLoader:URLLoader = new URLLoader();
var days:Array;
myTextLoader.addEventListener(Event.COMPLETE, onLoaded);
function onLoaded(e:Event):void
{
days = e.target.data.split(/\n/);
var str:String;
stage.addEventListener(MouseEvent.CLICK, arrayToString);
}
myTextLoader.load(new URLRequest("test.txt"));
function arrayToString(e:MouseEvent):void
{
text_txt.multiline = true;
text_txt.wordWrap = true;
text_txt.autoSize = TextFieldAutoSize.LEFT;
text_txt.text = days.join("\n");
}
also i moved arrayToString out of onLoaded
for second Question: to checking existance of a word, its better using indexOf("word") instead comparing it with "==" operator, because of invisible characters like "\r" or "\n".
if (str.indexOf("sun") >= 0){
show(obj01);
}
if (str.indexOf("mon") >= 0){
show(obj02);
}
Answer to the first part:
for (var i=0; i<days.length; i++)
{
str = days[i];
text_txt.text += str + ";" + "\n";
}
I hope I understand you correctly..
I wrote from memory, sorry for typos if there are...
For the second part, add a switch-case
switch(str) {
case "sun":
Show(??);
break;
.
.
.
}
I wish to take selected data from a collection of csv files, i have written code but confused on its behaviour, it reads them all, what am i doing wrong please.
string[] array1 = Directory.GetFiles(WorkingDirectory, "00 DEV1 2????????????????????.csv"); //excludes "repaired" files from array, and "Averaged" logs, if found, note: does not exclude duplicate files if they exist (yet)
Console.WriteLine(" Number of Files found with the filter applied = {0,6}", (array1.Length));
int i = 1;
foreach (string name in array1)
{
// sampling engine loop here, take first line only, first column DateTimeStamp and second is Voltage
Console.Write("\r Number of File currently being processed = {0,6}", i);
i++;
var reader = new StreamReader(File.OpenRead(name)); // Static for testing only, to be replaced by file filter code
reader.ReadLine();
reader.ReadLine(); // skip headers, read and do nothing
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var values = line.Split(',');
using (StreamWriter outfile = new StreamWriter(#"C:\\SampledFileResults.txt",true))
{
string content = "";
{
content = content + values[0] + ",";
content = content + values[9] + ",";
}
outfile.WriteLine(content);
Console.WriteLine(content);
}
}
} Console.WriteLine("SAMPLING COMPLETED");
Console.ReadLine();
Console.WriteLine("Test ended on {0}", (DateTime.Now));
Console.ReadLine();
}
}
You are using a while loop to read through all lines of the file. If you only want a single line, you can remove this loop.
Just delete the line:
while (!reader.EndOfStream)
{
And the accompanying close bracket
}