I'm working on price comparision program for 3 website. Each website can have same product with other websites but product name is not exactly same (EX: "Asus X553MA-XX102H Intel Celeron N2930 4GB 1TB DVDRW 15.6 Windows 8.1" and "Asus X553MA 15.6 Inch Intel Celeron 4GB 1TB Laptop" is one product but the name is not exactly same).
I was crawled data from 3 website to mysql table called crawledproduct(which has 3 columns: sourceurl, productname, price).
Please help me write a MySQL query command to find all same product by product name.
EX: Select * from crawledproduct where [Similar with 'Asus X553MA 15.6 Inch Intel Celeron 4GB 1TB Laptop']
Thanks for any help.
I am assuming that the name of the product is given as an input by the user itself or you know the product name which you wanna compare.
You need the 'LIKE' clause in your query. Suppose you want to search by the word 'axus':
Select name from crawledproduct
where productName like '%axus%'
% is called a wildcard. It tells the DBMS that you want to search for this pattern. Suppose you want to search for "axus" in each row in column A:
like '%axus' //This means give the rows which have entries in column A ending with axus
like 'axus%' //This means give the rows which have entries in column A starting with axus
like '%axus%' //This means give the rows which have entries in column A which contain the word axus.
Ofcourse, you need to enter the search term properly in order to get all the products. If a same product does not have the keyword that you specified, then it won't be displayed in your output. There are several other ways to search for a pattern in your database table.
You might wanna do a bit of a research on that, because I am a beginner and I don't have much knowledge yet.
Good luck!
Kudos! :)
If have a standard set of criteria that know will always appear in similar product names, you could use a like query to get what you need.
For instance, given your example a above, you can get it like this
Select name from crawledproduct
where name like '%X553MA%'
This would work. Note if you have a lot of data, this can result in very long queries, so you might want to take advantage of MySQL's fulltext searching which runs much faster. You would need to index name as a fulltext field, and then run a query like this:
SELECT name FROM crawledproduct MATCH(name) AGAINST ('X553MA')
Edit:
Note, both of these queries assume that X553MA will appear in all product names. You'd have to be careful about how you chose your search term.
Edit:
If you do not know the keyword, you could create a form which would search all three sites. The user could put in the keyword into this form.
For instance, using the like as mentioned, you could have each website's information stored in a database (assuming you have access to do so) each on a different table.
and search like so:
Select tableA.name
FROM TableA
JOIN TableB
ON TableB.name = TableA.name
JOIN TableC
ON TableC.name = TableA.name
WHERE name like '%$search_term%'
and you would have $search_term come from the user.
However if you are looking to actually crawl the site, then SQL is not the tool you want.
Related
I'm working on something that shows shops under a specific category, however I have an issue because I store the categories of a shop like this in a record with the id of a category. "1,5,12". Now, the problem is if I want to show shops with category 2, it "mistakens" 12 as category 2. This is the SQL right now.
SELECT * FROM shops WHERE shop_cats LIKE '%".$sqlid."%' LIMIT 8
Is there a way to split the record "shop_cats" by a comma in SQL, so it checks the full number? The only way I can think of is to get all the shops, and do it with PHP, but I don't like that as it will take too many resources.
This is a really, really bad way to store categories, for many reasons:
You are storing numbers as strings.
You cannot declare proper foreign key relationships.
A (normal) column in a table should have only one value.
SQL has poor string functions.
The resulting queries cannot take advantage of indexes.
The proper way to store this information in a database is using a junction table, with one row per shop and per category.
Sometimes, we are stuck with other people's really bad design decisions. If this is your case, then you can use FIND_IN_SET():
WHERE FIND_IN_SET($sqlid, shop_cats) > 0
But you should really fix the data structure.
If you can, the correct solution should be to normalize the table, i.e. have a separate row per category, not with commas.
If you can't, this should do the work:
SELECT * FROM shops WHERE CONCAT(',' , shop_cats , ',') LIKE '%,".$sqlid.",%' LIMIT 8
The table shops does not follow 1NF (1st Normal Form) i.e; every column should exactly one value. To avoid that you need to create another table called pivot table which relates two tables (or entities). But to answer your question, the below SQL query should do the trick.
SELECT * FROM shops WHERE concat(',',shop_cats,',') LIKE '%,".$sqlid.",%' LIMIT 8
We would like to filter purchase orders either based on purchase order id (primary key) or name of the purchase order using a single search box.
We used the like parameter to search on the name field, but it doesn't seem to work on the primary key. It works only when we use the equal operator for id(s). But it would be preferable if we can filter purchase orders using like for id(s). How to do this?
create table purchase_orders (
id int(11) primary key,
name varchar(255),
...
)
Option 1
SELECT *
FROM purchase_orders
WHERE id LIKE '%123%'; -- tribute to TemporaryNickName
This is horrible, performance-wise :)
Option 2a
Add a text column which receives a string version of id. Maybe add some triggers to populate it automatically.
Option 2b
Change the type of id column to CHAR or VARCHAR (I believe CHAR should be preferred for a primary key).
In both 2a. and 2b. cases, add an index (maybe a FULLTEXT one) to this column.
I think LIKE should work. I assume that your SQL wasn't correctly written.
Let's assume that you have order name "ABCDEF" then you can find this using the following query structure.
SELECT id FROM purchase_orders WHERE name LIKE '%CD%';
To explain it, % sign means it's a wildcard. As a result this query is going to select any String that contains "CD" inside of it.
According to the table structure, varchar can contain 255 characters. I think this is quite a large string and it's probably going to consume a lot of resources and going to take more time to search something using SQL functions like LIKE. You can always search it by id
WHERE id = something. This is much faster way btw
, but I don't think order id is an user friendly data, instead I would let users to use product name. My recommendation is to use apache Lucene or MySQL's full text search feature (which can improve search performance).
Apache lucene
MySQL Full text search function
These are tools built to search certain pattern or word through list of large strings in much faster way. Many websites use this to build their own mini search engines. I found mysql full text search function requires pretty much no learning curve and straight forward to use =D
Thank you in advance for any help you may be able to offer!
I'm working with an a bit of an odd database where products are related via tags and are not hierarchical.
I'm trying to select a single product using a SKU number from a table and join it with a table of product reviews like so:
SELECT ims.master_sku, ims.title, ims.price,
ims.description, ir.mvp_number, ir.title,
ir.review, ir.rating, ir.created_on
FROM default_inventory_master_skus AS ims
JOIN default_inventory_reviews AS ir
WHERE ims.master_sku = '22284319'
GROUP BY ir.review;
This gives me around 150 rows - which are all the same product but contain different reviews. My question is how can I return just the one product (as a single row) and somehow convert the reviews into columns associated with that one product?
Again - thank you for your time and help.
Rich
You can do that, although it's not "relational".
Looks like someone wants this data in Excel ;).
With MySQL, you will need to generate an SQL statement and execute it. Either within MySQL (in a procedure) or outside (e.g., in PHP). Query first for the pivot column names, put together the statement, then execute it.
An idea of the implementation is here:
http://www.artfulsoftware.com/infotree/queries.php#78
I know there are several question like this one but I feel I've spent more than enough time trying different examples for a simple little hobby project. Yes, I'm being lazy, but in my defense, It's Saturday morning...
So I've got a table, Items which have the following fields (and more irrelevant ones):
id (varchar), product (varchar), provider (varchar)
Items are part of products, and have a provider.
I want to write a query where I can get one item per product a given provider supplies.
So even if a provider supplies all items for a product, I just want one of those.
I know I can't do distinct on only one field but I've tried with different variants o joins.
Should be simple for those of you who actually know what you're doing.
mysql have gone to the trouble of creating a website that contains a manual and all references to its features.
for distinct:
http://dev.mysql.com/doc/refman/5.0/en/distinct-optimization.html
select DISTINCT(columnname) from MYTABLE where RULESAPPLY
You could use Distinct operator
SELECT DISTINCT * FROM ...
I've stuck with one quite tricky problem.
I have list of products from different warehouses, where each product have: Brand and Model plus some extra details. Model could be quite different from different warehouses for the same product, but Brand is always the same.
All list of products I store in one table, let's say it will be Product table.
Then I have another table - Model, with CORRECT Model Name, Brand and additional details like image, description etc. Plus I have keywords column where I try to add all keywords manually.
And here is the problem, I need to associate each product that I receive from warehouse with one record from my Model table. Right now I'm using full text search in boolean mode, but that's quite painful and does not work very well. I need to do a lot of manual work.
Here are just few examples of names that I have:
WINT.SPORT3D
WINT.SPORT3D XL
WINT.SPORT 3D
WINT.SPORT3D MO
WINTER SPORT 3D
The correct name for all of these items would be: WINTER SPORT 3D, so they should all be assigned to the same model.
So, is there any way to improve full text search or some other technique to solve my problem?
Database that I'm using is MySQL, I would prefer not to change it.
I'll start by putting together a more formal definition of the tables:
warehouse:
warehouse_id,
warehouse_product_id,
product_brand,
product_name,
local_id
Here I'd using local_id as a foreign key to your 'Model' table - but to avoid further confusion, I'll call it 'local'
local:
id,
product_brand,
product_name
It seems like the table you describe as 'product' is redundant.
Obviously until the data is cross referenced, local_id will be null. But after it is populated it won't have to change, and given a warehouse_id, a band and a product, you can find your local descriptor easily:
SELECT local.*
FROM local, warehouse
WHERE local.id=warehouse.local_id
AND warehouse.product_brand=local.product_brand
AND warehouse_id=_____
AND warehouse.product_brand=____
AND warehouse.product_name=____
So all you need to do is populate the links. Soundex is a rather crude tool - a better solution for this would be the Levenstein distance algorithm. There's a mysql implementation here
Given a set of rows in the warehouse table which need to be populated:
SELECT w.*
FROM warehouse w
WHERE w.local_id IS NULL;
...for each row identify the best match as (using the values from the previous query as w.*)....
SELECT local.id
FROM local
WHERE local.product_brand=w.product_brand
ORDER BY levenstein(local.product_name, w.product_name) ASC
LIMIT 0,1
But this will find the best match, even if the 2 strings are completely different! Hence....
SELECT local.id
FROM local
WHERE local.product_brand=w.product_brand
AND levenstein(local.product_name, w.product_name)<
(IF LENGTH(local.product_name)<LENGTH(w.product_name),
LENGTH(local.product_name), LENGTH(w.product_name))/2
ORDER BY levenstein(local.product_name, w.product_name) ASC
LIMIT 0,1
...requires at least half the string to match.
So this can be implemented in a single update statement:
UPDATE warehouse w
SET local_id=(
SELECT local.id
FROM local
WHERE local.product_brand=w.product_brand
AND levenstein(local.product_name, w.product_name)<
(IF LENGTH(local.product_name)<LENGTH(w.product_name),
LENGTH(local.product_name), LENGTH(w.product_name))/2
ORDER BY levenstein(local.product_name, w.product_name) ASC
LIMIT 0,1
)
WHERE local_id IS NULL;
Try Soundex. All of your examples resolve to W532 while the last one resolves to W536. So, you could:
Add a column to PRODUCT and MODEL called SoundexValue and calculate the Soundex value for each product and model
Compare the Soundex values in the PRODUCT table to the ones in the Model Table. You may have to use a range (+/- 5) to get a higher rate of matching.
Follow the 80/20 rule. That is, spend 80% of your manual effort on the 20% that don't easily fall out.