Deleting duplicate data in MySQL - mysql

I'm trying to emulate the accepted answer in this SO question: Delete all Duplicate Rows except for One in MySQL? [duplicate] with a twist, I want the data (auto-incrementing ID's) of one table to determine which rows to delete in another table. SQLFiddle here showing data.
In the fiddle referenced above, the end result I'm looking for is the rows in eventdetails_new with Event_ID = 4 & 6 to be deleted (EVENTDETAILS_ID's 5 & 6, and 9 & 10), leaving rows 3 & 5 (EVENTDETAILS_ID's 3 & 4 and 7 & 8). I hope that made sense. Ideally the rows in events_new with those same Event_ID's would get deleted as well (which I haven't started working on yet, so no code samples).
This is the query I'm trying to make work, but I'm a bit over my head:
SELECT *
FROM eventdetails_new AS EDN1, eventdetails_new AS EDN2
INNER JOIN events_new AS E1 ON `E1`.`Event_ID` = `EDN1`.`Event_ID`
INNER JOIN events_new AS E2 ON `E2`.`Event_ID` = `EDN2`.`Event_ID`
WHERE `E1`.`Event_ID` > `E2`.`Event_ID`
AND `E1`.`DateTime` = `E2`.`DateTime`
AND events_new.EventType_ID = 6;
Here's the same SQLFiddle with the results of this query. Not good. I can see the Event_ID in the data, but the query cannot for some reason. Not sure how to proceed to fix this.
I know it's a SELECT query, but I couldn't figure out a way to have two aliased tables in the DELETE query (which I think I need?). I figured if I could get a selection, I could delete it with some C# code. However ideally it could all be done in a single query or set of statements without having to go outside of MySQL.
Here's my first cut at the query, but it's just as bad:
DELETE e1 FROM eventdetails_new e1
WHERE `events_new`.`Event_ID` > `events_new`.`Event_ID`
AND events_new.DateTime = events_new.DateTime AND events_new.EventType_ID = 6;
SQLFiddle won't let me run this query at all, so it's not much help. However, it give me the same error as the one above: Error Code: 1054. Unknown column 'events_new.Event_ID' in 'where clause'
I'm by no means married to either of these queries if there's a better way. The end result I'm looking for is deleting a bunch of duplicate data.
I have hundreds of thousands of these results, and I know that roughly 1/3 of them are duplicates that I need to get rid of before we go live with the database.

Here's what I eventually ended up doing. My co-worker & I came up with a query that would give us a list of Event_ID's that had duplicate data (we actually used Access 2010's query builder and MySQL-ified it). Bear in mind this is a complete solution where the original question didn't have as much detail as far as linked tables. If you've got questions about this, feel free to ask & I'll try to help:
SELECT `Events_new`.`Event_ID`
FROM Events_new
GROUP BY `Events_new`.`PCBID`, `Events_new`.`EventType_ID`, `Events_new`.`DateTime`, `Events_new`.`User`
HAVING (((COUNT(`Events_new`.`PCBID`)) > 1) AND ((COUNT(`Events_new`.`User`)) > 1) AND ((COUNT(`Events_new`.`DateTime`)) > 1))
From this I processed each Event_ID to remove the duplicates in an iterative manner. Basically I had to delete all the child rows starting from the last lowest table so that I didn't run afoul of foreign key restraints.
This chunk of code was written in LinqPAD as C# statements: (sbCommonFunctions is an inhouse DLL designed to make most (but not all as you'll see) database functions be handled the same way or easier)
sbCommonFunctions.Database testDB = new sbCommonFunctions.Database();
testDB.Connect("production", "database", "user", "password");
List<string> listEventIDs = new List<string>();
List<string> listEventDetailIDs = new List<string>();
List<string> listTestInformationIDs = new List<string>();
List<string> listTestStepIDs = new List<string>();
List<string> listMeasurementIDs = new List<string>();
string dtQuery = (String.Format(#"SELECT `Events_new`.`Event_ID`
FROM Events_new
GROUP BY `Events_new`.`PCBID`,
`Events_new`.`EventType_ID`,
`Events_new`.`DateTime`,
`Events_new`.`User`
HAVING (((COUNT(`Events_new`.`PCBID`)) > 1)
AND ((COUNT(`Events_new`.`User`)) > 1)
AND ((COUNT(`Events_new`.`DateTime`)) > 1))"));
int iterations = 0;
DataTable dtEventIDs = getDT(dtQuery, testDB);
while (dtEventIDs.Rows.Count > 0)
{
Console.WriteLine(dtEventIDs.Rows.Count);
Console.WriteLine(iterations);
iterations++;
foreach(DataRowView eventID in dtEventIDs.DefaultView)
{
listEventIDs.Add(eventID.Row[0].ToString());
DataTable dtEventDetails = testDB.QueryDatabase(String.Format(
"SELECT * FROM EventDetails_new WHERE Event_ID = {0}",
eventID.Row[0]));
foreach(DataRowView drvEventDetail in dtEventDetails.DefaultView)
{
listEventDetailIDs.Add(drvEventDetail.Row[0].ToString());
}
DataTable dtTestInformation = testDB.QueryDatabase(String.Format(
#"SELECT TestInformation_ID
FROM TestInformation_new
WHERE Event_ID = {0}",
eventID.Row[0]));
foreach(DataRowView drvTest in dtTestInformation.DefaultView)
{
listTestInformationIDs.Add(drvTest.Row[0].ToString());
DataTable dtTestSteps = testDB.QueryDatabase(String.Format(
#"SELECT TestSteps_ID
FROM TestSteps_new
WHERE TestInformation_TestInformation_ID = {0}",
drvTest.Row[0]));
foreach(DataRowView drvTestStep in dtTestSteps.DefaultView)
{
listTestStepIDs.Add(drvTestStep.Row[0].ToString());
DataTable dtMeasurements = testDB.QueryDatabase(String.Format(
#"SELECT Measurements_ID
FROM Measurements_new
WHERE TestSteps_TestSteps_ID = {0}",
drvTestStep.Row[0]));
foreach(DataRowView drvMeasurements in dtMeasurements.DefaultView)
{
listMeasurementIDs.Add(drvMeasurements.Row[0].ToString());
}
}
}
}
testDB.Disconnect();
string mysqlConnection =
"server=server;\ndatabase=database;\npassword=password;\nUser ID=user;";
MySqlConnection connection = new MySqlConnection(mysqlConnection);
connection.Open();
//start unwinding the duplicates from the lowest level upward
whackDuplicates(listMeasurementIDs, "measurements_new", "Measurements_ID", connection);
whackDuplicates(listTestStepIDs, "teststeps_new", "TestSteps_ID", connection);
whackDuplicates(listTestInformationIDs, "testinformation_new", "testInformation_ID", connection);
whackDuplicates(listEventDetailIDs, "eventdetails_new", "eventdetails_ID", connection);
whackDuplicates(listEventIDs, "events_new", "event_ID", connection);
connection.Close();
//update iterator from inside the clause in case there are more duplicates.
dtEventIDs = getDT(dtQuery, testDB); }
}//goofy curly brace to allow LinqPAD to deal with inline classes
public void whackDuplicates(List<string> listOfIDs,
string table,
string pkID,
MySqlConnection connection)
{
foreach(string ID in listOfIDs)
{
MySqlCommand command = connection.CreateCommand();
command.CommandText = String.Format(
"DELETE FROM " + table + " WHERE " + pkID + " = {0}", ID);
command.ExecuteNonQuery();
}
}
public DataTable getDT(string query, sbCommonFunctions.Database db)
{
return db.QueryDatabase(query);
//}/*this is deliberate, LinqPAD has a weird way of dealing with inline
classes and the last one can't have a closing curly brace (and the
first one has to have an extra opening curly brace above it, go figure)
*/
Basically this is a giant while loop, and the clause iterator is updated from inside the clause until the number of Event_ID's drops to zero (it takes 5 iterations, some of the data has as many as six duplicates).

Related

Error when using textbox values inside string sql

"Select item from table1 where Spare parts='"+ textbox1.text+"'".
I have tried to replace item with Textbox2.text.
I used :
"Select'"& textbox2.text& "' from table1 where Spare parts='"+ textbox1.text+"'"
I got error.
I used "+ textbox2.text+" I got error too
What you have here is one of the fastest ways out there to get your app hacked. It is NOT how you include user input in an SQL statement.
To explain the right way, I also need to include the connection and command objects for context, so I may also have a different pattern for how I handle these than you're use to. I'm also assuming the mysql tag in the question is accurate (though I have my doubts), such that the correct code looks more like this:
string SQL = "Select item from table1 where `Spare parts`= #SpareParts";
using cn = new MySqlConnection("connection string here")
using cmd = new MySqlCommand(SQL, cn)
{
cmd.Parameters.AddWithValue("#SpareParts", TextBox1.Text);
cn.Open();
using (var rdr = cmd.ExecuteReader())
{
while (rdr.Read())
{
// ...
}
}
}
Note the backticks around Spare Parts, so it will be correctly treated as a single object name by MySql.

Primefaces Autocomplete from huge database not acting fast

I am using primefaces autocomplete component with pojos and which is filled from a database table with huge number of rows.
When I select value from database which contains millions of entries (SELECT synonym FROM synonyms WHERE synonym like '%:query%') it takes a very long time to find the word on autocomplete because of huge database entries on my table and it will be bigger in future.
Is there any suggestions on making autocomplete acting fast.
Limiting the number of rows is a great way to speed-up autocomplete. I'm not clear on why you'd limit to 1000 rows though: you can't show 1000 entries in a dropdown; shouldn't you be limiting to maybe 10 entries?
Based on your comments below, here is an example database query that you should be able to adapt to your situation:
String queryString = "select distinct b.title from Books b where b.title like ':userValue'";
Query query = entityManager.createQuery(queryString);
query.setParameter("userValue", userValue + "%");
query.setMaxResults(20);
List<String> results = query.getResultList();
I finally went to using an index solar for doing fast requests while my table will contains more than 4 million entries which must be parsed fastly and without consuming a lot of memory.
Here's I my solution maybe someone will have same problem as me.
public List<Synonym> completeSynonym(String query) {
List<Synonym> filteredSynonyms = new ArrayList<Synonym>();
// ResultSet result;
// SolrQuery solrQ=new SolrQuery();
String sUrl = "http://......solr/synonym_core";
SolrServer solr = new HttpSolrServer(sUrl);
ModifiableSolrParams parameters = new ModifiableSolrParams();
parameters.set("q", "*:*"); // query everything
parameters.set("fl", "id,synonym");// send back just the id
//and synonym values
parameters.set("wt", "json");// this in json format
parameters.set("fq", "synonym:\"" + query+"\"~0"); //my conditions
QueryResponse response;
try {
if (query.length() > 1) {
response = solr.query(parameters);
SolrDocumentList dl = response.getResults();
for (int i = 0; i < dl.size(); i++) {
Synonym s = new Synonym();
s.setSynonym_id((int) dl.get(i).getFieldValue("id"));
s.setSynonymName(dl.get(i).getFieldValue("synonym")
.toString());
filteredSynonyms.add(s);
}
}
} catch (SolrServerException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
return filteredSynonyms;
}

SQL WHERE LIKE clause in JSF managed bean

Hi i have this managed bean where it makes MySQL queries, the problem here is the SQL statement makes a '=' condition instead of 'LIKE'
Here is the code in my managed bean.
Connection con = ds.getConnection();
try{
if (con == null) {
throw new SQLException("Can't get database connection");
}
}
finally {
PreparedStatement ps = con.prepareStatement(
"SELECT * FROM Clients WHERE Machine LIKE '53'");
//get customer data from database
ResultSet result = ps.executeQuery();
con.close();
List list;
list = new ArrayList();
while (result.next()) {
Customer cust = new Customer();
cust.setMachine(result.getLong("Machine"));
cust.setCompany(result.getString("Company"));
cust.setContact(result.getString("Contact"));
cust.setPhone(result.getLong("Phone"));
cust.setEmail(result.getString("Email"));
//store all data into a List
list.add(cust);
}
return list;
Here the SELECT command does not pull all the numbers in 'Machine' column which is like 53, but if i enter a whole value, such as the complete number (53544) in place of 53 then the result is pulled up. I am confused !!
Also if i replace the above select statement with SELECT * FROM Clients the entire database is stored in list. Any ideas ?
Use wildcards:
Like '%53%'
...means everything that contains '53'.
Like '%53' - it ends with 53
LIKE '53%' - it starts with 53
You can also use _ if You want to replace a single character.
You can find a descriptipn HERE
You sql query should be
"SELECT * FROM Clients WHERE Machine LIKE '%53%'

How to fetch a row from one table and insert it into another table and get new PK value

I have two similar tables on different databases.
Database1/TableA
Database2/TableA
I want to fetch a row from one table and insert it into other table on other server. Like:
Database1/TableA
Id State Name
500 OH John [Fetch this row]
Database2/TableA
Id State Name
1 OH John [Insert and fetch PK '1']
I tried this using bulkcopy and it works fine.
But problem is I need to get PK from the new insert as I need to populate another child table.
Is there any better way to achieve this? Please on C# code, no database linking or SQL queries. Just C# solutions. Or if query can be used in C# code that is fine. Any working example code with Dataset or Datarow will be great help.
Thanks!
First you need to get the row(s) from Database.TableA. You could for example use a SqlDataAdapter with a DataTable or a SqlDataReader.
SCOPE_IDENTITY returns the last identity value inserted into an identity column in the same scope. A scope is a module: a stored procedure, trigger, function, or batch. Therefore, two statements are in the same scope if they are in the same stored procedure, function, or batch.
You can use SqlCommand.ExecuteScalar to execute the insert command and retrieve the new ID in one query.
const String sqlSelect = "SELECT COL1,COl2,Col3 FROM TableA WHERE COL1=#COL1;"
const String sqlInsert = "INSERT INTO TableA (COl2,Col3)VALUES (#Col2,#Col3);"
+ "SELECT CAST(scope_identity() AS int)";
using (var con1 = new SqlConnection(db1ConnectionString))
using (var con2 = new SqlConnection(db2ConnectionString))
{
con1.Open();
con2.Open();
using(var selectCommand = new SqlCommand(sqlSelect, con1))
{
selectCommand.Parameters.AddWithValue("#COL1", 4711);
using (var reader = selectCommand.ExecuteReader())
{
if (reader.Read())
{
int newID;
using (var insertCommand = new SqlCommand(sqlInsert, con2))
{
for (int i = 0; i < reader.FieldCount; i++)
{
insertCommand.Parameters.AddWithValue("#" + reader.GetName(i), reader[i]);
}
newID = (int)insertCommand.ExecuteScalar();
}
}
}
}
}

How to write Case Sensitive Query for MS Access?

I want to know the Select Query for MS Access with case sensitive.
I have two values for VitualMonitorName as below
VCode VirtualMonitorName
Row 1 (1, 'VM1');
Row 2 (2, 'Vm1');
Here both values are different.
If I write
"SELECT VCode FROM VirtualMaster WHERE VirtualMonitorName like '" + Vm1 + "'";
It replies VCode = 1 Only.
You can use the StrComp() function with vbBinaryCompare for a case-sensitive comparison. Here is an example from the Immediate window to show how StrComp() works. See the Access help topic for more details.
? StrComp("a", "A", vbBinaryCompare)
1
? StrComp("a", "A",vbTextCompare)
0
StrComp() returns 0 if the first two arguments evaluate as equal, 1 or -1 if they are unequal, and Null if either argument is Null.
To use the function in a query, supply the vbBinaryCompare constant's value (0) rather than its name.
SELECT VCode
FROM VirtualMaster
WHERE StrComp(VirtualMonitorName, "Vm1", 0) = 0;
This approach is also available to queries from other applications if they use the newer Access Database Engine ("ACE") drivers. For example, the following C# code
string myConnectionString =
#"Driver={Microsoft Access Driver (*.mdb, *.accdb)};" +
#"Dbq=C:\Users\Public\Database1.accdb;";
using (OdbcConnection con = new OdbcConnection(myConnectionString))
{
con.Open();
using (var cmd = new OdbcCommand())
{
cmd.Connection = con;
cmd.CommandText =
"SELECT COUNT(*) AS n FROM [VirtualMaster] " +
"WHERE StrComp([VirtualMonitorName],?,?) = 0";
cmd.Parameters.AddWithValue("?", "Vm1");
cmd.Parameters.Add("?", OdbcType.Int);
var vbCompareOptions = new Dictionary<string, int>()
{
{"vbBinaryCompare", 0},
{"vbTextCompare", 1}
};
string currentOption = "";
currentOption = "vbBinaryCompare";
cmd.Parameters[1].Value = vbCompareOptions[currentOption];
Console.WriteLine(
"{0} found {1} record(s)",
currentOption,
Convert.ToInt32(cmd.ExecuteScalar()));
currentOption = "vbTextCompare";
cmd.Parameters[1].Value = vbCompareOptions[currentOption];
Console.WriteLine(
"{0} found {1} record(s)",
currentOption,
Convert.ToInt32(cmd.ExecuteScalar()));
}
}
produces
vbBinaryCompare found 1 record(s)
vbTextCompare found 2 record(s)
Check this out:
https://support.microsoft.com/kb/244693?wa=wsignin1.0
This article describes four methods of achieving a case-sensitive JOIN using the Microsoft Jet database engine. Each of these methods has advantages and disadvantages that should be weighed before choosing an implementation. The methods are:
StrComp
Case-Sensitive IISAM Driver
Hexadecimal Expansion
Binary Field
Using only built-in functions, add an additional custom column in the query design view:
location: InStr(1,[VCode],"VM1",0)
the zero parameter requests binary compare (case sensitive) when finding location of "VM1" within [VCode]
set the criteria in that column to >0 so only records with non-zero location in the vcode matching Like "*vm*" contain the exact VM1 string -
The WHERE clause looks like:
WHERE (((VirtualMaster.VCode) Like "\*vm*") AND ((InStr(1,[VCode],"VM1",0))>0));
Using at a simpler level of coding.
As a condition in a DCOUNT operation, checking on a Field (Column) that has to have the correct Case, and ignoring Blank States/Territories.
' lngcounter will count the all States
' or Territories Field ( Column) with this
' exact case value of 'Ohio'. ([ID] is an Autonumber ID field)
lngCounter = DCount("[id]", Trim(Me!tboDwellingTablename), "[State/territory],'Ohio',0) = 0")
This only does one letter:
MS-ACCESS SQL:
SELECT Asc(Left([Title],1)) AS t FROM Master WHERE (((Asc(Left([Title],1)))=105));
Title is the field you want to search
Master is the Table where Title field is located
105 Ascii code for character..
In this case only Title's that start with i not I
If you want to search for lower case "a" you would change the 105 to 97