Removing duplicates from a large dataset - duplicates

I have a dataset of more than 800000 rows and every even line is a duplicate of the odd one before it. I'd like to remove the duplicates. Please can someone assist?

Could try using this, it uses buffered reading and writing to read/write line by line, skipping every other one. (Currently don't have access to a compiler to get any little bugs out, if you have any problems comment and I'll edit, alright?)
Charset charset = Charset.forName("US-ASCII"); //Change to the right charset
Path toRead = Paths.get("largefile.txt");
Path toWrite = Paths.get("filteredfile.txt");
try (BufferedReader reader = Files.newBufferedReader(toRead, charset)) {
String line = null;
int skip=0;
while ((line = reader.readLine()) != null) {
if(skip==0)
{
skip=1;
try (BufferedWriter writer = Files.newBufferedWriter(toWrite, charset)) {
writer.write(line, 0, line.length());
writer.newLine();
writer.close();
} catch (IOException x) {
System.err.format("IOException: %s%n", x);
}
}
else skip=0;
}
} catch (IOException x) {
System.err.format("IOException: %s%n", x);
}

I think you should give more info on that matter, programming language, etc...
My guess is that you should change the query to avoid duplicates (even using a "distinct" should work).
Please post more info so that we can help you.

Related

kotlin try catch block with bundle.getString

We have intent extras that are passed back to an Activity
The code is written with Kotlin 1.3 and is posted below
We do not understand why the code needs to be in a try catch block
Our question is there a better way to write this code and can someone explain why the code requires a try catch block. We know it could be written with when.
The navigation back to this Activity is accomplished with various intents that do not always put all the values that the bundle gets.
One button uses this code
val intent = Intent(this,MainActivity::class.java)
intent.putExtra("FROM", "NEW")
intent.addFlags(Intent.FLAG_ACTIVITY_NO_ANIMATION)
startActivity(intent)
While another button uses this code
holder.ivEdit.setOnClickListener {
//val rowid = friendList.get(position).id
val intent = Intent(context, MainActivity::class.java)
intent.putExtra("FROM", "UPDATE")
intent.putExtra("recordID", items.id)
intent.putExtra("PERSON", items.person)
intent.putExtra("PHONE", items.phone)
intent.flags = Intent.FLAG_ACTIVITY_NEW_TASK
context.startActivity(intent)
}
Here is the code that is in the Activity that has the try catch code
The code is inside the onCreate function
try {
val bundle: Bundle = intent.extras
from = bundle.getString("FROM","")
txtPerson = bundle.getString("PERSON","")
txtPhone = bundle.getString("PHONE","")
if(from == "UPDATE") {
showMSG("To CANCEL use back button")
id = bundle.getInt("recordID", 4)
btnAdd.visibility = View.INVISIBLE
btnEdit.visibility = View.VISIBLE
btnViewList.visibility = View.INVISIBLE
etPerson.setText(txtPerson)
etPhone.setText(txtPhone)
}else if (from == "DELETE"){
showMSG("To CANCEL use back button")
btnAdd.visibility = View.INVISIBLE
btnViewList.visibility = View.INVISIBLE
btnEdit.visibility = View.INVISIBLE
btnDelete.visibility = View.VISIBLE
etPerson.setText(txtPerson)
etPhone.setText(txtPhone)
etPerson.isEnabled = false
etPhone.isEnabled = false
}else{
btnViewList.visibility = View.VISIBLE
btnAdd.visibility = View.VISIBLE
btnEdit.visibility = View.INVISIBLE
}
if (id != 0) {
//etPerson.setText(txtPerson)
//etPhone.setText(txtPhone)
}
} catch (ex: Exception) {
}
The guess here is that the Activity with the try catch is also navigated to by another activity that passes no information for the bundle so the bundle gets set to null
intent.extras must not be null so if it is null you need a way to deal with that fact
I do not see a better way around the issue than the try catch block
perhaps someone can offer another solution.

LibXML C++ XPathEval Errors

For starters, I'm seeing two types of problems with my the functionality of the code. I can't seem to find the correct element with the function xmlXPathEvalExpression. In addition, I am receiving errors similar to:
HTML parser error : Unexpected end tag : a
This happens for what appears to be all tags in the page.
For some background, the HTML is fetched by CURL and fed into the parsing function immediately after. For the sake of debugging, the return statements have been replaced with printf.
std::string cleanHTMLDoc(std::string &aDoc, std::string &symbolString) {
std::string ctxtID = "//span[id='" + symbolString + "']";
htmlDocPtr doc = htmlParseDoc((xmlChar*) aDoc.c_str(), NULL);
xmlXPathContextPtr context = xmlXPathNewContext(doc);
xmlXPathObjectPtr result = xmlXPathEvalExpression((xmlChar*) ctxtID.c_str(), context);
if (xmlXPathNodeSetIsEmpty(result->nodesetval)) {
xmlXPathFreeObject(result);
xmlXPathFreeContext(context);
xmlFreeDoc(doc);
printf("[ERR] Invalid XPath\n");
return "";
}
else {
int size = result->nodesetval->nodeNr;
for (int i = size - 1; i >= 0; --i) {
printf("[DBG] %s\n", result->nodesetval->nodeTab[i]->name);
}
return "";
}
}
The parameter aDoc contains the HTML of the page, and symbolString contains the id of the item we're looking for; in this case yfs_l84_aapl. I have verified that this is an element on the page in the style span[id='yfs_l84_aapl'] or <span id="yfs_l84_aapl">.
From what I've read, the errors fed out of the HTML Parser are due to a lack of a namespace, but when attempting to use the XHTML namespace, I've received the same error. When instead using htmlParseChunk to write out the DOM tree, I do not receive these errors due to options such as HTML_PARSE_NOERROR. However, the htmlParseDoc does not accept these options.
For the sake of information, I am compiling with Visual Studio 2015 and have successfully compiled and executed programs with this library before. My apologies for the poorly formatted code. I recently switched from writing Java in Eclipse.
Any help would be greatly appreciated!
[Edit]
It's not a pretty answer, but I made what I was looking to do work. Instead of looking through the DOM by my (assumed) incorrect XPath expression, I moved through tag by tag to end up where I needed to be, and hard-coded in the correct entry in the nodeTab attribute of the nodeSet.
The code is as follows:
std::string StockIO::cleanHTMLDoc(std::string htmlInput) {
std::string ctxtID = "/html/body/div/div/div/div/div/div/div/div/span/span";
xmlChar* xpath = (xmlChar*) ctxtID.c_str();
htmlDocPtr doc = htmlParseDoc((xmlChar*) htmlInput.c_str(), NULL);
xmlXPathContextPtr context = xmlXPathNewContext(doc);
xmlXPathObjectPtr result = xmlXPathEvalExpression(xpath, context);
if (xmlXPathNodeSetIsEmpty(result->nodesetval)) {
xmlXPathFreeObject(result);
xmlXPathFreeContext(context);
xmlFreeDoc(doc);
printf("[ERR] Invalid XPath\n");
return "";
}
else {
xmlNodeSetPtr nodeSet = result->nodesetval;
xmlNodePtr nodePtr = nodeSet->nodeTab[1];
return (char*) xmlNodeListGetString(doc, nodePtr->children, 1);
}
}
I will leave this question open in hopes that someone will help elaborate upon what I did wrong in setting up my XPath expression.

skipping unwated rows while reading a flat file in SSIS

I have a SSIS package which is trying to read data from a text file. The issue I am facing is that the text file doesn't have very straight forward data as in it has special characters which are creating trouble
For Example, right after the header row, there's a row full of hyphens, something like -----------------------------------------------------------------------------------------
This SSIS is reading as the first value of the first column beacause of which it fails. How do I get rid of this, without actually removing the row from the file itself?
Also, in later part of the file as well, there are some unwanted rows which I would like to ignore, the format of the file is something like this :
Header
Data
Random Rows
Same header row as above
Data
and so on.....
I would like to know if there's a way to handle this with script task or any other way before or while the 'Flat File source' task gets executed, without actually making changes in the original file.
I don't know of anyway to filter these rows on input using the Flat File Source component, but you can definitely do some filtering if you read the file in with a Script Component.
If you add a reference to Microsoft.VisualBasic, you can use the below function to read your CSV into a datatable:
public static DataTable ReadInDataFromCSV(string fileName, string delimiter)
{
DataTable dtOutput = new DataTable();
//How many lines to read in. 0 for unlimited
int numberOfLines = 0;
using (TextFieldParser parser = new TextFieldParser(fileName))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(delimiter);
//Are column names in first row?
bool columnNamesInFirstRow = true;
int rowCounter = 0;
string[] currentRow;
while (!parser.EndOfData && rowCounter <= numberOfLines)
{
try
{
currentRow = parser.ReadFields();
/*****************************
Add some kind of logic here to skip over rows you don't
want to read in
*****************************/
if (columnNamesInFirstRow == true)
{
foreach (string column in currentRow)
{
dtOutput.Columns.Add(column);
}
columnNamesInFirstRow = false;
}
else
{
DataRow dr;
dr = dtOutput.NewRow();
dr.ItemArray = currentRow;
dtOutput.Rows.Add(dr);
columnNamesInFirstRow = false;
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
rowCounter += (numberOfLines == 0) ? 0 : 1;
}
}
return dtOutput;
}
By default, the above code will read a flat file into a DataTable by calling something like:
DataTable myInputData = ReadInDataFromCSV(#"Path to file",",")
If you modify the commend I added inside the try/catch, you can filter out the rows you aren't interested in. For example, to skip the rows with hypens, you can add a simple check like:
if (currentRow.IndexOf("-----") > 0)
{
continue;
}
else
{
//If/else statement from the original code that adds the data to a DataRow and then adds it to the DataTable
}
Then you can simply add more similar checks to include/not include certain rows in your file. Good luck!

Speech API (SAPI) floating point division by zero in C++ Builder on Windows 7

I use the following code for Text-To-Speech application controls for blind persons in C++ Builder (most likely similar example can be used in Delphi). Main form has KeyPreview property checked to enable key F11 preview to start speaking active (focused) control. The code as it is works but there are some problems. This example is in C++ Builder code but from what I've found, Delphi suffers from same problem and the solution I found is the same. If you have Delphi solution, feel free to post it, it is similar anyway.
#include <sapi.h>
#include <WTypes.h>
//---------------------------------------------------------------------------
// Speak text string (synchronous function)
//---------------------------------------------------------------------------
bool SpeakText(UnicodeString Text)
{
ISpVoice* pVoice = NULL;
if (FAILED(::CoInitialize(NULL))) return false;
Word Saved8087CW = Default8087CW; // Disable floating point division by zero exception caused by Speak
Set8087CW(0x133f);
HRESULT hr = CoCreateInstance(CLSID_SpVoice, NULL, CLSCTX_ALL, IID_ISpVoice, (void **)&pVoice);
if (SUCCEEDED(hr))
{
//pVoice->SpeakCompleteEvent()
//pVoice->SetSyncSpeakTimeout(1000);
hr = pVoice->Speak(WideString(Text).c_bstr(), SPF_DEFAULT, NULL);
pVoice->Release();
pVoice = NULL;
}
Set8087CW(Saved8087CW);
::CoUninitialize();
return true;
}
//---------------------------------------------------------------------------
void __fastcall TForm1::FormKeyUp(TObject *Sender, WORD &Key, TShiftState Shift)
{
UnicodeString Speaker;
if (Key == VK_F11)
{
if (Screen->ActiveControl->InheritsFrom(__classid(TButton))) { Speaker += "Button, " + static_cast<TButton*>(Screen->ActiveControl)->Caption + "."; }
else if (Screen->ActiveControl->InheritsFrom(__classid(TEdit))) { Speaker += "Edit box, " + static_cast<TEdit*>(Screen->ActiveControl)->Text + "."; }
}
if (Speaker != "") SpeakText(Speaker);
}
//---------------------------------------------------------------------------
Problems:
pVoice->Speak causes Floating point division by zero if I don't override the exception using the Set8087CW function. This happens only on Windows 7 (possibly Vista and Windows 8 too) but not on Windows XP in the same program (compiled exe). Is there a solution without using Set8087CW? Removing these lines will cause the problem and exception. I have BCB2010.
Function is synchronous and won't shut up or return control to program until it finishes reading text. This is a problem for longer text. It also blocks program events. Is there a way to make it asynchronous or introduce an event to periodically check for F11 key status and if F11 is pressed again it stops reading and uninitializes object? For example poll every 300 ms (or after each word etc.) for key-press F11 and if pressed, stop speaking? Or run it threaded?
Does SAPI has memory leaks as some write on various sites?
Can above code use OleCheck instead of CoCreateInstance and CoUninitialize?
UPDATE for those looking for solution as suggested by Remy Lebeau:
SavedCW = Get8087CW();
Set8087CW(SavedCW | 0x4);
hr = pVoice->Speak(WideString(Text).c_bstr(), SPF_DEFAULT | SPF_ASYNC, NULL);
pVoice->WaitUntilDone(-1); // Waits until text is done... if F11 is pressed simply go out of scope and speech will stop
Set8087CW(SavedCW);
Also found detailed example in CodeRage 4 session: http://cc.embarcadero.com/item/27264
The error does occur in Vista as well. Masking floating point exceptions is the only solution.
To make Speak() run asynchronously, you need to include the SPF_ASYNC flag when calling it. If you need to detect when asynchronous speaking is finished, you can use ISpVoice::WaitUntilDone(), or call ISpVoice::SpeakCompleteEvent() and pass the returned HANDLE to one of the WaitFor...() family of functions, like WaitForSingleObject().
What kind of leaks do other sites talk about?
Not instead of, no. OleCheck() merely checks the value of an HRESULT value and throws an exception if it is an error value. You still have to call COM functions that return the actual HRESULT values in the first place. If anything, OleCheck() would be a replacement for SUCCEEDED() instead.
For what you are attempting, I would suggest the following approach instead:
struct s8087CW
{
Word Saved8087CW;
s8087CW(Word NewCW)
{
Saved8087CW = Default8087CW;
Set8087CW(NewCW);
// alternatively, the VCL documentation says to use SetExceptionMask() instead of Set8087CW() directly...
}
~s8087CW()
{
Set8087CW(Saved8087CW);
}
};
//---------------------------------------------------------------------------
__fastcall TForm1::TForm1(TComponent *Owner)
: TForm(Owner)
{
::CoInitialize(NULL);
}
//---------------------------------------------------------------------------
__fastcall TForm1::~TForm1()
{
if (pVoice) pVoice->Release();
::CoUninitialize();
}
//---------------------------------------------------------------------------
void __fastcall TForm1::FormKeyUp(TObject *Sender, WORD &Key, TShiftState Shift)
{
if (Key == VK_F11)
{
TWinControl *Ctrl = Screen->ActiveControl;
if (Ctrl)
{
TButton *btn;
TEdit *edit;
if ((btn = dynamic_cast<TButton*>(Ctrl)) != NULL)
SpeakText("Button, " + btn->Caption);
else if ((edit = dynamic_cast<TEdit*>(Ctrl)) != NULL)
SpeakText("Edit box, " + edit->Text);
}
}
}
//---------------------------------------------------------------------------
ISpVoice* pVoice = NULL;
bool __fastcall TForm1::SpeakText(const String &Text)
{
s8087CW cw(0x133f);
if (!pVoice)
{
if (FAILED(CoCreateInstance(CLSID_SpVoice, NULL, CLSCTX_ALL, IID_ISpVoice, (void **)&pVoice)))
return false;
}
SPVOICESTATUS stat;
pVoice->GetStatus(&stat, NULL);
while (stat.dwRunningState == SPRS_IS_SPEAKING)
{
ULONG skipped;
pVoice->Skip(L"SENTENCE", 1000, &skipped);
pVoice->GetStatus(&stat, NULL);
}
return SUCCEEDED(pVoice->Speak(WideString(Text).c_bstr(), SPF_ASYNC, NULL));
}

Create a Non-Database-Driven Lookup

Lots of references for creating lookups out there, but all seem to draw their values from a query.
I want to add a lookup to a field that will add items from a list of values that do not come from a table, query, or any other data source.
Such as from a string: "Bananas, Apples, Oranges"
..or a container ["Bananas", "Apples", "Oranges"]
Assume the string/container is a dynamic object. Drawing from an static enum is not a choice.
Is there a way to create lookups on the fly from something other than a data source?
Example code would be a great help, but I'll take hints as well.
There is the color picker.
Also in the Global you will find pickXxxx such as pickList.
There are others, pickUser, pickUserGroup etc.
Take a look on the implementation. I guess they build a temporary table then displays that. Tables are great!
Update:
To go on you own follow the rules.
For the advanced user, see also: Lookup form returning more than one value.
public void lookup()
{
SysTableLookup sysTableLookup;
TmpTableFieldLookup tmpTableFieldLookup;
Enumerator en;
List entitylist = new list(types::String);
entitylist.addend("Banana");
entitylist.addend("Apple");
en = entityList.getEnumerator();
while (en.moveNext())
{
tmpTableFieldLookup.TableName = en.current();
tmpTableFieldLookup.insert();
}
sysTableLookup = SysTableLookup::newParameters(tableNum(tmpTableFieldLookup), this);
sysTableLookup.addLookupfield(fieldNum(TmpTableFieldLookup, TableName));
//BP Deviation documented
sysTableLookup.parmTmpBuffer(tmpTableFieldLookup);
sysTableLookup.performFormLookup();
}
The above code helps in displaying strings as lookup.
I'm also guessing there's no way to perform a lookup without a table. I say that because a lookup is simply a form with one or more datasources that is displayed in a different way.
I've also blogged about this, so you can get some info on how to perform a lookup, even with a temporary table, here:
http://devexpp.blogspot.com.br/2012/02/dynamics-ax-custom-lookup.html
Example from global::PickEnumValue:
static int pickEnumValue(EnumId _enumId, boolean _omitZero = false)
{
Object formRun;
container names;
container values;
int i,value = -1,valueIndex;
str name;
#ResAppl
DictEnum dictEnum = new DictEnum(_enumId);
;
if (!dictEnum)
return -1;
for (i=1;i<=dictEnum.values();i++)
{
value = dictEnum.index2Value(i);
if (!(_omitZero && (value == 0)))
{
names += dictEnum.index2Label(i);
values += value;
}
}
formRun = classfactory.createPicklist();
formRun.init();
formRun.choices(names, #ImageClass);
formRun.caption(dictEnum.label());
formRun.run();
formRun.wait();
name = formRun.choice();
value = formRun.choiceInt();
if (value>=0) // the picklist form returns -1 if a choice has not been made
{
valueIndex = -1;
for (i=1;i<=conLen(names);i++)
{
if (name == conPeek(names,i))
{
valueIndex = i;
break;
}
}
if (valueIndex>=0)
return conPeek(values,valueIndex);
}
return value;
}
It isn't the most graceful solution, but this does work, and it doesn't override or modify any native AX 2012 objects:
Copy the sysLookup form from AX2009 (rename it) and import it into AX 2012.
We'll call mine myLookupFormCopy.
I did a find/replace of "sysLookup" in the XPO file to rename it.
Create this class method:
public static client void lookupList(FormStringControl _formStringControl, List _valueList, str _columnLabel = '')
{
Args args;
FormRun formRun;
;
if (_formStringControl && _valueList && _valueList.typeId() == Types::String)
{
args = new Args(formstr(myLookupFormCopy));
args.parmObject(_valueList);
args.parm(_columnLabel);
formRun = classFactory.formRunClass(args);
_formStringControl.performFormLookup(formRun);
}
}
In the lookup method for your string control, use:
public void lookup()
{
List valueList = new List(Types::String);
;
...build your valueList here...
MyClass::lookupList(this, valueList, "List Title");
super();
}