I have a big text file that looks like:
Mitchel-2
Anna-2
Witold-4
Serena-3
Serena-9
Witros-3
I need so the first word before "-" never duplicates. Any way to remove all except the first one. So if I have like 3000 lines starting with "Serena" but there's always a different number after "-", is there a way to remove 2999 lines of Serena and leave just the first one?
Also Serena is just an example, I have over 200 other words that duplicate.
I don't think you can do it with notepad++. You could use a regex for every name, but since you have over 200, that would be unpractical.
But you can write a program that do it for you. Basically you go through 2 steps:
1) You search for every unique name and save it on a set (doesn't allow for duplicate entries).
2) For every unique name on the set, you search for the duplicates on the file.
I've wrote a simple c++ program that finds the duplicates on a string variable. You can adapt it to a langue of your preference. I compiled it with Microsoft Visual Studio Community 2015 (it doesn't work in cpp.sh)
#include "stdafx.h"
#include <regex>
#include <string>
#include <iostream>
#include <set>
using namespace std;
int main()
{
typedef match_results<const char*> cmatch;
set<string> names;
string notepad_text = "Serena-1\nSerena-2\nSerena-3\nSerena-4\nAna-1\nSerena-7\nWilson-1\nAna-2\nJohn-1\nAna-3\nJohn-2\nWilson-2";
regex regex_find_names("^\\w+"); //double slashes are needed because this is in a string
// 1) Let's find every name
//sregex_iterator it_beg(notepad_text.begin(), notepad_text.end(), regex_find_names);
sregex_iterator find_names_itit(notepad_text.begin(), notepad_text.end(), regex_find_names);
sregex_iterator it_end; //defaults to the end condition
while (find_names_itit != it_end) {
names.insert(find_names_itit->str()); //automatically deletes duplicates
++find_names_itit;
}
// 2) For demonstration purposes, let's print what we've found
cout << "---printing the names we've found:\n\n";
set<string>::const_iterator names_it; // declare an iterator
names_it = names.begin(); // assign it to the start of the set
while (names_it != names.end()) // while it hasn't reach the end
{
cout << *names_it << " ";
++names_it;
}
// 3) Let's find the duplicates
cout << "\n\n---printing the regex matches:\n";
string current_name;
set<string>::const_iterator current_name_it; //this iterates over every name we've found
current_name_it = names.begin();
while (current_name_it != names.end())
{
// we're building something like "^Serena.*"
current_name = "^";
current_name += *current_name_it;
current_name += ".*";
cout << "\n-Lets find duplicates of: " << *current_name_it << endl;
++current_name_it;
// let's iterate through the matches
regex regex_obj(current_name); //double slashes are needed because this is in a string
sregex_iterator it_beg(notepad_text.begin(), notepad_text.end(), regex_obj);
sregex_iterator it(notepad_text.begin(), notepad_text.end(), regex_obj); //this iterates over the match results
sregex_iterator it_end;
//string res = *it;
while (it != it_end) {
if (it != it_beg)
{
cout << it->str() << endl;
}
++it;
}
}
int i; //depending on the compaling getting this additional char is necessary to see the console window
cin >> i;
return 0;
}
Input string is:
Serena-1
Serena-2
Serena-3
Serena-4
Ana-1
Serena-5
Wilson-1
Ana-2
John-1
Ana-3
John-2
Wilson-2
Here it prints
---printing the names we've found:
Ana John Serena Wilson
---printing the regex matches:
-Lets find duplicates of: Ana
Ana-2
Ana-3
-Lets find duplicates of: John
John-2
-Lets find duplicates of: Serena
Serena-2
Serena-3
Serena-4
Serena-5
-Lets find duplicates of: Wilson
Wilson-2
Related
I am a very beginner. I have a C++ program in CLI, with a HTML/XML input file thanks to a ifstream file("C:/.......").
The main problem is I want to take some text in this file, and put into variable. But the difficulties I meet is that I only want to take some part of the file, for instance in a var1. I want text which is between the HTML tag <name> or the one I choose.
I already tried to put some getline with condition or move cursors but I only have all the text or nothing.
//here some part of the code that i'm sure of
string info(""), line(""), system("");
ifstream file("C:/Users/[...]/file.xml");
if (file.is_open())
{
while (getline(file, line))
{
cout << line << endl;
}
file.close();
}
else
cout << "file is not open" << endl;
Then I call the var with the the text, sorry for English mistakes or code mistakes, and thanks in advance if you could give me some clues.
I have written a QT5 application that creates a monthly rota/timesheet based on various inputs. It generates a csv file that I can use excel to read and print. I can use libreoffice to print this onto a single A4 sheet.
However what I would really like to do is to use qt to print the table directly to the printer.
I am afraid that I am confused as to how best to go about trying to achieve this. I have used html with a QTextDocument to successfully print out the rota/timesheet. However the result ends up on two pages rather then one. I print it out in landscape mode. I think that it would be good to scale the height of the document down to fit on one page.
void ViewEditRotaDialog::m_printButtonSlot()
{
QString strStream;
QTextStream out(&strStream);
const int rowCount = m_tableWidget->rowCount();
const int columnCount = m_tableWidget->columnCount();
out << "<html>\n"
"<head>\n"
"<meta Content=\"Text/html; charset=Windows-1251\">\n"
<< QString("<title>%1</title>\n").arg("ROTA")
<< "</head>\n"
"<body bgcolor=#ffffff link=#5000A0>\n"
"<table border=1 cellspacing=0 cellpadding=2>\n";
// headers
out << "<thead><tr bgcolor=#f0f0f0>";
for (int column = 0; column < columnCount; column++)
out << QString("<th>%1</th>").
arg(m_tableWidget->horizontalHeaderItem(column)->text());
out << "</tr></thead>\n";
// data table
for (int row = 0; row < rowCount; row++)
{
out << "<tr>";
for (int column = 0; column < columnCount; column++)
{
QString data
m_tableWidget->item(row,column)->text().simplified();
out << QString("<td bkcolor=0>%1</td>").
arg((!data.isEmpty()) ? data : QString(" "));
}
out << "</tr>\n";
}
out << "</table>\n"
"</body>\n"
"</html>\n";
QTextDocument *document = new QTextDocument();
document->setHtml(strStream);
QPrinter printer(QPrinter::HighResolution);
printer.setOrientation(QPrinter::Landscape);
printer.setPageMargins(0.1,0.1,0.1,0.1,QPrinter::Millimeter);
printer.setFullPage(true);
QPrintDialog *dialog = new QPrintDialog(&printer, NULL);
if (dialog->exec() != QDialog::Accepted)
return;
document->print(&printer);
delete document;
}
I have seen other examples using QPainter and trying to scale the output.
Should I be doing this and using drawcontents() or should I be using a completely different method?
I decided to have a play with using painter and drawContents(). I was pleased that I could get it to do what I needed with minimum effort. I dont yet fully understand the details of how this works, but I will look into it in more detail later. It might be that I will need to enhance this, but it looks very good for what I need. Simply put it looks like I just needed to change the scales to make it do what I required. Not ever having used QT before to do printing I did'nt really know how best to do this. But I am happy with the result.
I replaced the code under
QTextDocument *document = new QTextDocument();
with
`
document->setHtml(strStream);
QPrinter printer(QPrinter::HighResolution);
printer.setPaperSize(QPrinter::A4);
printer.setOrientation(QPrinter::Landscape);
printer.setPageMargins(0.1,0.1,0.1,0.1,QPrinter::Millimeter);
printer.setFullPage(true);
QPrintDialog *dialog = new QPrintDialog(&printer, NULL);
if (dialog->exec() != QDialog::Accepted)
return;
QPainter painter;
painter.begin(&printer);
double xscale = printer.pageRect().width() / document->size().width();
double yscale = printer.pageRect().height() / document->size().height();
painter.translate(printer.paperRect().x() + printer.pageRect().width() / 2,
printer.paperRect().y() + printer.pageRect().height() / 2);
painter.scale(xscale, yscale);
painter.translate(-document->size().width() / 2,
-document->size().height() / 2);
document->drawContents(&painter);
painter.end();
delete document;
}`
This may not be the best answer, but it works so far.
The goal is to achieve a QDomDocument or something similar with the content of an HTML (not XML) document.
The problem is that some tags, especially script trigger errors:
<!DOCTYPE html>
<html>
<head>
<script type="text/javascript">
var a = [1,2,3];
var b = (2<a.length);
</script>
</head>
<body/>
</html>
Not well formed: Element type "a.length" must be followed by either attribute specifications, ">" or "/>".
I understand that HTML is not the same as XML, but it seems reasonable that Qt has a solution for this:
Setting the parser to accept HTML
Another class for HTML
A way to set some tags name as CDATA.
My current try only achieves normal XML parsing:
QString mainHtml;
{
QFile file("main.html");
if (!file.open(QIODevice::ReadOnly)) qDebug() << "Error reading file main.html";
QTextStream stream(&file);
mainHtml = stream.readAll();
file.close();
}
QQDomDocument doc;
QString errStr;
int errLine=0, errCol=0;
doc.setContent( mainHtml, false, &errStr, &errLine, &errCol);
if (!errStr.isEmpty())
{
qDebug() << errStr << "L:" << errLine << ":" << errCol;
}
std::function<void(const QDomElement&, int)> printTags=
[&printTags](const QDomElement& elem, int tab)
{
QString space(3*tab, ' ');
QDomNode n = elem.firstChild();
for( ;!n.isNull(); n=n.nextSibling())
{
QDomElement e = n.toElement();
if(e.isNull()) continue;
qDebug() << space + e.tagName();
printTags( e, tab+1);
}
};
printTags(doc.documentElement(), 0);
Note: I would like to avoid including the full webkit for this.
I recommend to use htmlcxx. It is licensed under LPGL. It works on Linux and Windows. If you use windows compile with msys.
To compile it just extract the files and run
./configure --prefix=/usr/local/htmlcxx
make
make install
In your .pro file add the include and library directory.
INCLUDEPATH += /usr/local/htmlcxx/include
LIBS += -L/usr/local/htmlcxx/lib -lhtmlcxx
Usage example
#include <iostream>
#include "htmlcxx/html/ParserDom.h"
#include <stdlib.h>
int main (int argc, char *argv[])
{
using namespace std;
using namespace htmlcxx;
//Parse some html code
string html = "<html><body>heymyhome</body></html>";
HTML::ParserDom parser;
tree<HTML::Node> dom = parser.parseTree(html);
//Print whole DOM tree
cout << dom << endl;
//Dump all links in the tree
tree<HTML::Node>::iterator it = dom.begin();
tree<HTML::Node>::iterator end = dom.end();
for (; it != end; ++it)
{
if (strcasecmp(it->tagName().c_str(), "A") == 0)
{
it->parseAttributes();
cout << it->attribute("href").second << endl;
}
}
//Dump all text of the document
it = dom.begin();
end = dom.end();
for (; it != end; ++it)
{
if ((!it->isTag()) && (!it->isComment()))
{
cout << it->text() << " ";
}
}
cout << endl;
return 0;
}
Credits for the example:
https://github.com/bbxyard/sdk/blob/master/examples/htmlcxx/htmlcxx-demo.cpp
You can't use an XML parser for HTML. You either use htmlcxx or convert the HTML to valid XML. Then you are free to use QDomDocument, Qt XML parsers, etc.
QWebEngine has also parsing functionality, but brings a large overhead with the application.
I am downloading a web page and I am trying to extract some values from it.
The places of the page that I am interested in are of this type:
<a data-track=\"something\" href=\"someurl\" title=\"Heaven\"><img src=\"somesource.jpg\" /></a>
and I need to extract the href (someurl) value. Note that there are multiple entries like the one above in the HTML string that I have and thus I will use a list to store all the URLs that I extract from the string.
This is what I've tried so far:
QString html_str=myfile();
QRegExp regex("<a data-track\\=\"something\" href\\=\".*(?=\" title)");
if(regex.indexIn(html_str) != -1){
QStringList list;
QString str;
list = regex.capturedTexts();
foreach(str,list)
qDebug() << str.remove("<a data-track=\"something\" href=\"");
}
With the above code I get only one occurrence (list.count() == 1) which contains the whole HTML string from the first occurrence of someurl till the end of the file, without the <a data-track="something" href="" in it, which have all been removed.
I'd do it like this: (make sure you double check your regex)
QRegExp regex("<a data-track=\"something\" href=\".*(?=\" title)");
if (regex.indexIn(html_str) != -1) qDebug() << html_str.cap().remove(<a data-track=\"something\" href=\");
You can use a while loop to control the position of the "html_str"
pos = regex.indexIn(htmlContent); // get the first position
while(pos = regex.indexIn(htmlContent, pos) != -1){ // continue next
QStringList list;
list = regex.capturedTexts();
foreach(QString url, list) {
// do something
}
pos += regex.matchedLength();
}
I have a column of WKT POLYGON values in MySQL (I inherited the db). The polys are queried and rendered on Google Maps. Since Google Maps polygon overlay requires an array of points, the previous user converted the WKT values to coordinate pairs and stored them in another column. This actually works rather well, but not well enough.
For one, the conversion was occasionally faulty, and for two, I am looking for ways to make this faster.
Re. the first issue, I have to re-implement this, and am looking for a converter that will convert a WKT poly into a string of coordinates. I am thinking I could use this to either write a stored procedure that will query the WKT column and spit out a string of JSON text that could be readily converted to Google Maps polys, or even preprocess all the WKT polys and store them as text like it is already done, but this time with correct values.
So, I am really looking for a function to convert WKT to a string of its constituent point coordinates, kinda like so
SELECT AsStringOfCoords(WKT_Column) FROM table WHERE condition
where AsStringOfCoords() would be my custom function.
I'd wrote a little C++ program to do MySQL WKT polygons to KML polygons.
This thing works as follows:
Read the information from the database
create a kml document
Rearrange the information and print it out to the file.
You can call the new kml to googlemaps and it shows pretty nice.
The source code is here...
#include <iostream>
#include <string>
/*
* Database includes...
*/
#include <mysql_connection.h>
#include <cppconn/driver.h>
#include <cppconn/exception.h>
#include <cppconn/resultset.h>
#include <cppconn/statement.h>
#include <cppconn/prepared_statement.h>
#include "../iolib/IOCoreFuncs.h"
#include "../iolib/ioconfigurador.h"
using namespace std;
using namespace sql;
using namespace IOCore;
sql::Connection * conectaDB(string dbSvr, string dbUsr, string dbPwd, string dbNombre);
int main(int argc, char **argv) {
string qry, arproc;
Connection * dbCon;
Statement * stmt;
IOConfigurador * miConf;
ResultSet * rs;
//Cargar configuraciĆ³n...
if (argc == 3) {
arproc = argv[2];
} else {
cout << "Using mode: sqltokml <polygon id> <file kml to export>\n";
return 1;
}
dbCon = conectaDB("dbserver", "dbuser"), "dbpasswd", "dbname");
stmt = dbCon->createStatement();
qry = "SELECT name, astext(geoarea) from " + "table name" + " where id = '" + argv[1] + "';";
rs = stmt->executeQuery(qry);
if (rs->rowsCount() > 0) {
string polnombre, polcoords;
string salida;
while (rs->next()) {
ofstream sale;
polnombre = rs->getString(1);
polcoords = rs->getString(2);
salida = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
"<kml xmlns=\"http://www.opengis.net/kml/2.2\" xmlns:gx=\"http://www.google.com/kml/ext/2.2\" xmlns:kml=\"http://www.opengis.net/kml/2.2\" xmlns:atom=\"http://www.w3.org/2005/Atom\">\n"
"<Document>\n"
"<name>" + polnombre + ".kml</name>\n"
"<Style id=\"sh_ylw-pushpin3\">\n"
"<IconStyle>\n"
"<scale>1.3</scale>\n"
"<Icon>\n"
"<href>http://maps.google.com/mapfiles/kml/pushpin/ylw-pushpin.png</href>\n"
"</Icon>\n"
"<hotSpot x=\"20\" y=\"2\" xunits=\"pixels\" yunits=\"pixels\"/>\n"
"</IconStyle>\n"
"<LineStyle>\n"
"<color>467f5500</color>\n"
"<width>3</width>\n"
"</LineStyle>\n"
"<PolyStyle>\n"
"<color>46ff5555</color>\n"
"</PolyStyle>\n"
"</Style>\n"
"<StyleMap id=\"msn_ylw-pushpin10\">\n"
"<Pair>\n"
"<key>normal</key>\n"
"<styleUrl>#sn_ylw-pushpin30</styleUrl>\n"
"</Pair>\n"
"<Pair>\n"
"<key>highlight</key>\n"
"<styleUrl>#sh_ylw-pushpin3</styleUrl>\n"
"</Pair>\n"
"</StyleMap>\n"
"<Style id=\"sn_ylw-pushpin30\">\n"
"<IconStyle>\n"
"<scale>1.1</scale>\n"
"<Icon>\n"
"<href>http://maps.google.com/mapfiles/kml/pushpin/ylw-pushpin.png</href>\n"
"</Icon>\n"
"<hotSpot x=\"20\" y=\"2\" xunits=\"pixels\" yunits=\"pixels\"/>\n"
"</IconStyle>\n"
"<LineStyle>\n"
"<color>467f5500</color>\n"
"<width>3</width>\n"
"</LineStyle>\n"
"<PolyStyle>\n"
"<color>46ff5555</color>\n"
"</PolyStyle>\n"
"</Style>\n"
"<Folder>\n"
"<name>" + polnombre + "</name>\n"
"<Placemark>\n"
"<name>" + polnombre + "</name>\n"
"<styleUrl>#msn_ylw-pushpin10</styleUrl>\n"
"<Polygon>\n"
"<tessellate>1</tessellate>\n"
"<outerBoundaryIs>\n"
"<LinearRing>\n"
"<coordinates>\n";
//Coordinates tranformation...
polcoords = polcoords.substr(9, polcoords.size() - 11);
vector< string > lascoords = split(polcoords, ",");
for (unsigned i = 0; i < lascoords.size(); i++) {
salida += lascoords[i].substr(0, lascoords[i].find(" ")) + ",";
salida += lascoords[i].substr(lascoords[i].find(" ") + 1) + ",0 ";
}
salida += "\n</coordinates>\n"
"</LinearRing>\n"
"</outerBoundaryIs>\n"
"</Polygon>\n"
"</Placemark>\n"
"</Folder>\n"
"</Document>\n"
"</kml>";
sale.open(arproc.c_str(), ios::out | ios::app);
sale << salida ;
sale.close();
}
}
rs->close();
stmt->close();
dbCon->close();
}
sql::Connection * conectaDB(string dbSvr, string dbUsr, string dbPwd, string dbNombre)
{
sql::Connection * retval;
sql::Driver *ctrl;
try {
ctrl = get_driver_instance();
retval = ctrl->connect(dbSvr, dbUsr, dbPwd);
retval->setSchema(dbNombre);
} catch (sql::SQLException &err) {
cout<<"Errors... :( "<<err.what()<<"\ngoing out\n";
retval = 0;
}
return retval;
}
I hope this can help you. Is easy to translate this to MySQL stored proc, or use it inside PHP or another languages... I also have some php/javascript scripts to do the same things with points.