Removing near identical values from mysql table

Removing near identical values from mysql table - mysql

Is there a way of removing near identical values from a table in mysql? My table has records more than 10K out of which one of the company looks like this:
id name
123 Vianet Group Inc
5214 Vianet Group, Inc.
on using describe tablename I get this:
Field Type Null Key Default Extra
id int NO PRI auto_increment
name varchar(150) NO UNI
the names of the company are same however I would like to delete the second instance from table, thereby keeping just a single instance of the name in the table. This is just one instance and there are others like these.. Is there a swift way of removing identical values from the column? Please help.

You could try using soundex to find the "near identical" values -
SELECT *
FROM tablename t1
INNER JOIN tablename t2
ON t1.id < t2.id
AND SOUNDEX(t1.name) = SOUNDEX(t2.name)
You will need to test it with some of your example "near identical" values to see what it does and does not work for. As suggested by Akina you will probably need to go for some kind of normalisation process (stored function) or the Levenshtein distance function linked by Slava.

Related

Query build automation

I have the following query
select count(t1.guid)
from table t1
where t1.id=X;
X is a result-set from this query
select ID
from table t2
where t2.flags=65537;
The above query returns 84 results, all of INT datatype.
id is primary key in t2 table,
and foreign key in t1 table;
guid is primary key in t1 table,
and doesn't exist anywhere else.
Object O1 has a unique identifier among the table that declares all objects and their properties (t2)
GUID in table t1 assigns unique identification to every instance of object O1 called by upper layers.
I want to see the number of duplicates every object that fulfills conditions in the second query.
I suppose I should go about declaring a variable and a function that uses said variable but got no clue where to start or how to go about it.
I solved the problem once with hand-hacking 84 times, but looking for a more elegant and more adaptive solution to this;

After a whole day spent, figured it out
Simply link the two posted queries together, but change the "=" operator to "in"
select count(t1.guid)
from table t1
where t1.id in
(select t2.ID
from table t2
where t2.flags=65537);
hand-hacking session avoided!

Most efficient way to select data from one sql table and see if it matches data on another table in the same database

I have a database with 2 tables, both tables have around 200,000 records.
Lets call these tables, TableA and TableB
Currently I have a function that triggers a select query, this query grabs all records in TableA that match a condition. Once I have that data, I have a foreach loop that uses the data from TableA to see if it matches any record in TableB.
The problem is that it takes a while to do this because there are so many records. I know the way Im doing it works because it does what its supposed to but it takes a good 3 minutes to finish the script. Is there a faster more efficient way to do something like this?
Thank you in advance for the help.
PS: I'm using PHP.

The most efficient way to achieve what you want is to:
1. Create a primary key column for each table (if you do not already have one). Example schema where column "id" is a unique identifier for the table row:
TableA
id firstname lastname
1 Michael Douglas
2 Michael Jackson
TableB
id table_a_id pet
1 1 cat
2 2 ape
3 1 dog
Google or search here on stackoverflow on how to create or add a primary key for a mysql table column. An example of creating TableA with a primary key:
CREATE TABLE `TableA` (
`id` int(11) unsigned AUTO_INCREMENT,
`firstname` varchar(100),
`lastname` varchar(100),
PRIMARY KEY (`id`)
)
2. Create an SQL-query to fetch what you need. For example:
To get all rows with at least one match in BOTH tables:
SELECT TableA.id, TableA.firstname, TableA.lastname, TableB.pet
FROM TableA
INNER JOIN TableB
ON TableA.id = TableB.table_a_id;
To instead get all rows from TableA, and only the matching rows from TableB:
SELECT TableA.id, TableA.firstname, TableA.lastname, TableB.pet
FROM TableA
LEFT JOIN TableB
ON TableA.id=TableB.table_a_id;

The answer to your question ultimately depends on what you mean by "if it matches."
Let's assume, for a moment, that you have primary keys on each of these tables, TableA an TableB, and that you're NOT matching those. But that you have one or more other columns, the actual data that you're storing in each row, which you are considering for your matching. Let's call those ColA and ColB.
In that case you could use:
SELECT TableA.id, TableB.id, TableA.ColA, TableB.ColB
FROM TableA
LEFT JOIN TableB
ON (TableA.ColA = TableB.ColA)
AND (TableB.ColB = TableB.ColB);
... notice that we're using a complex expression on which to JOIN. You'd want to add an AND (TableA.XXX = TableB.XXX) for each columned that you want to consider significant in your matching.
Of course I'm assuming that these tables don't share a common surrogate key (otherwise MicKri's JOIN would be simpler ... or a "NATURAL JOIN" would be even simpler still).
What you're doing, conceptually, is defining a pair of (mathematical) sets an finding the intersection between them. The complication of doing this in SQL is that real world tables often have these extra columns (surrogate primary keys, and foreign keys) which aren't attributes of the underlying entities ... but which serve to map relationships among them.
In my example I'm just showing a way to formulate a JOIN query that finds the intersection based only on the attributes that are significant for your purposes.
(By the way, the parentheses in my example are there for human legibility. They should not be required by your SQL engine ... though they don't hurt, either).
Here's one of a number of visual explanations of SQL JOINs that's handy for learning this sort of thing. An INNER JOIN is an intersection. The ON and WHERE clauses define the subsets of the data (columns and rows, respectively) which are to be related.

Is there a keyword to identify a PRIMARY column in a MySQL WHERE clause?

I have a situation where the column name "field1" and "field3" are not given to me but all the other data is. The request is coming in via a url in like: /table1/1 or /table2/3 and it is assumed that 1 or 3 represent the primary key. However, the column name may be different. Consider the following 2 queries:
SELECT * FROM table1 where field1 = 1 and field2 =2;
SELECT * FROM table2 where field3 = 3 and field4 =4;
Ideally, I'd like to perform a search like the following:
SELECT * FROM table1 where MYSQL_PRIMARY_COLUMN = 1 and field2 =2;
SELECT * FROM table2 where MYSQL_PRIMARY_COLUMN = 3 and field4 =4;
Is there a keyword to identify MYSQL_PRIMARY_COLUMN in a MySQL WHERE clause?

No, there's no pseudocolumn you can use to map to the primary key column. One reason this is complicated is that a given table may have a multi-column primary key. This is a totally ordinary way to design a table:
CREATE TABLE BooksAuthors (
book_id INT NOT NULL,
author_id INT NOT NULL,
PRIMARY KEY (book_id, author_id)
);
When I implemented the table data gateway class in Zend Framework 1.0, I had to write code to "discover" the table's primary key column (or columns) as #doublesharp describes. Then the table object instance retained this information so that searches and updates knew which column (or columns) to use when generating queries.
I understand you're looking for a solution that doesn't require this "2 pass process" but no such solution exists for the general case.
Some application framework environments attempt to simplify the problem by encouraging developers to give every table a single column primary key, and name the column "id" by convention. But this convention is an oversimplification that fails to cover many legitimate table designs.

You can use DESCRIBE (which is a synonym for EXPLAIN) to get information about the table, which will include the all column information.
DESCRIBE `table`;
You can also use SHOW INDEX to just get information about the PRIMARY key.
SHOW INDEX FROM `table` WHERE Key_name = 'PRIMARY'

Inserting data from one table to another?

I've got two tables where I'm trying to insert data from one to another, I've been able to find a few examples of how this can be accomplished on the web, the problem is these examples mostly rely on identical table structure between the two ... you see I'm trying to insert some data from one table into another table with quite a different structure.
I'm trying to insert data from a table called 'catalog_product_entity_media_gallery' into a table called 'catalog_product_entity_varchar'. Below is a simple description of their structure
The 'catalog_product_entity_varchar' looks as follows:
value_id | entity_type_id | attribute_id | store_id | entity_id | value
PK INT INT INT INT VARCHAR
And the 'catalog_product_entity_media_gallery' table looks as follows:
value_id | attribute_id | entity_id | value
PK INT INT VARCHAR
I need to insert the entity, and value columns from catalog_product_entity_media_gallery into catalog_product_entity_varchar. However as you can see the structure is quite different.
The query I'm trying to use is as follows
USE magento_db;
INSERT INTO catalog_product_entity_varchar(entity_type_id, attribute_id, store_id, entity_id, value)
SELECT
4,
74,
0,
catalog_product_entity_media_gallery.entity_id,
catalog_product_entity_media_gallery.value
FROM catalog_product_entity_media_gallery;
I only need the entity_id and value from media_gallery and the other values are always the same, I have tried to do this using the above but this is just hanging in MySQL (no errors)
I think it's due to the fact that I'm trying to select 4, 74 and 0 from catalog_product_entity_media_gallery but I'm not 100% sure (apologies, I'm a bit of a novice with MySQL)
Can anybody point me in the right direction? Is there any way way I can insert some data from the media table whilst inserting static values for some columns? (I hope this all makes sense)

The query syntax is ok.
However, there may be issues with the unique and foreign keys in catalog_product_entity_varchar table, which doesn't allow you to insert data. Also the query may be waiting for some other query to complete (if your query is just a part of bigger scenario), so it is an issue with locking. Most probable is the first case.
Currently, the question lacks important details:
The MySQL client / programming code you use to perform query. So we
are not able to see the case in full and to reproduce it correctly
The scenario you perform. I.e. whether you do it inside the Magento application in some
module during a web-request. Or whether there are other queries in your script,
some opened transactions, other people accessing the DB server, etc.
Based on most probable assumption that you just don't see the actual error with unique/foreign keys, you may try the following queries.
1) Unique index failure.
Try this:
USE magento_db;
INSERT INTO catalog_product_entity_varchar(entity_type_id, attribute_id, store_id, entity_id, value)
SELECT
4 as etid,
74 as aid,
0 as sid,
catalog_product_entity_media_gallery.entity_id as eid,
catalog_product_entity_media_gallery.value as val
FROM
catalog_product_entity_media_gallery
GROUP BY
eid, aid, sid;
There is a huge possibility, that you insert non-unique entries, because catalog_product_entity_media_gallery can hold multiple entries for the same product, while catalog_product_entity_varchar can not. If the query above successfully completes, then the issue is really with unique key. In such a case you must re-verify what you want to achieve, because the initial aim (not the query itself) is wrong.
2) Wrong foreign key (non-existing attribute 74)
Try this (replacing ATTRIBUTE_CODE and ATTRIBUTE_ENTITY_TYPE_ID with the values you need, e.g. 'firstname' and 6):
USE magento_db;
INSERT INTO catalog_product_entity_varchar(entity_type_id, attribute_id, store_id, entity_id, value)
SELECT
4 as etid,
eav_attribute.attribute_id as aid,
0 as sid,
gallery.entity_id as eid,
gallery.value as val
FROM
catalog_product_entity_media_gallery AS gallery
INNER JOIN
eav_attribute
ON
eav_attribute.attribute_code = '<ATTRIBUTE_CODE>'
AND eav_attribute.entity_type_id = <ATTRIBUTE_ENTITY_TYPE_ID>
GROUP BY
eid, aid, sid;
If it executes successfully AND
Some rows are added to the catalog_product_entity_varchar - then it seems, that 74 was chosen as a wrong id of the attribute you needed, thus foreign key in catalog_product_entity_varchar didn't allow you to insert the records.
No rows are added to the catalog_product_entity_varchar - then it seems, that you mistake in attribute id, attribute code and entity type. Recheck, what you put as ATTRIBUTE_CODE and ATTRIBUTE_ENTITY_TYPE_ID.
If both queries still hang - then you have issues with your MySQL client or server or execution scenario.
Note: your initial query may make sense in your specific case, but some issues are signalling that something may be wrong with your approach, because:
You're using direct numbers for ids. But ids are different for different installations and Magento versions. It is expected to use more stable values, like attribute code in second query, by which you should extract the actual attribute id.
You copy data from the storage catalog_product_entity_media_gallery, which can store multiple entries for the same product, to the storage catalog_product_entity_varchar, which is able to store only one entry for the product. It means, that you cannot copy all the data in such a way. Probably, your query doesn't reflect the goal you want to achieve.
The entity type id, inserted to the catalog_product_entity_varchar is not related to attribute id. While in Magento these are deeply connected things. Putting the wrong entity type id in a table will either make Magento behave incorrectly, or it won't notice your changes at all.

try this
INSERT INTO catalog_product_entity_varchar( entity_id, value)
VALUES (
SELECT entity_id, value
FROM catalog_product_entity_media_gallery
WHERE value_id = here the row id of value_id which have those values 4,74,0 )

Assuming the valued_id in the catalog_product_entity_varchar table is an autoincrement, could you not do the following?
USE magento_db;
INSERT INTO catalog_product_entity_varchar(entity_type_id, store_id, entity_id, value)
SELECT
4,
74,
catalog_product_entity_media_gallery.entity_id,
catalog_product_entity_media_gallery.value
FROM catalog_product_entity_media_gallery;
Note that there is no attribute_id column in your catalog_product_entity_varchar table.

MySQL Using a select to update multiple columns and records in a table

I've just added two columns to a table I was working on, and want to put numerical values in those two columns based on the value of another column. I need to do this for all of the records in the database (MySQL), can someone help me with this please?
The table (t1) looks like
spectra (varchar, primary key),
value REAL,(This is the one to use)
new_val 1 REAL
new_val 2 REAL
I'm thinking it should look something like;
update t1 set (new_val1,new_val2)=(select value*1.5,value*2 from t1) as t2 where t1.spectra=t2.spectra;
Any suggestions please?
Thanks,
James

All you need is:
update t1 set new_val1 = value*1.5, new_val2 = value*2;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008