mysql match urls - mysql

I am inserting urls in a mysql table. For example i have inserted 8 entries as below:
url
-----------------------------
http://example.com
http://www.example.com
http://example.com/
http://www.example.com/
http://example.com/sports
http://www.example.com/sports
http://example.com/sports/
http://www.example.com/sports/
. Now how can i write a query to match example.com which should return the first 4 entries since they are the same url? Similarly how do i write a query to get the last 4 entries as they are the same? Even if i have huge number of entries the query should be fast is it possible ??

Well, if you have those links in a single table, you could get them like:
SELECT * FROM table WHERE url LIKE '%example.com%'
Is this fast? NO - it will require full table scan.
If I were you, I would model my DB to hold those URLs in 2 tables:
links
id
*base_url* - holds example.com
related_links
id
*link_id* - FK on links
subdomain - holds www.
*relative_url* - holds /sports/
Edit - to answer comment:
Your DB is not normalized right now. You hold multiple records for "the same thing" - you are not benefitting the advantages of DBs. DBs are useful when working with structured data - your query needs to make string operations - an pretty complex ones. So, while it would probably be possible to return the results you need and want with the current form of the DB, it won't be a trivial task, and definitely performance would suck.
My recommendation - modify the DB - at least add the columns subdomain and relative_path to your table and hold this information as separate as possible - to be able to make aggregated queries on it.

Related

Cache, Database, Over 400k Listing

In my MySQL database I have a table of products which contains almost 625k rows. The table has 162 columns.
Now there is a search box on my home page where you can search for anything and, if your search term is matched from any of my product titles, it give you a list of 15 products. This is similar to Amazon and other e-commerce websites.
What I did so far was to create a JSON file with all the product ID's and title names. When user inputs a minimum of 3 chars into the search field, an AJAX request is made and gets the list. But my issue is that the JSON file is almost 12MB in size, and the ajax calls it whenever user write's a char or removes a char. It was working fine until I was on local Machine and now as soon as I made it live it doesn't work for users, having lower then 5 MBPS internet connection. So I am looking for some advice, how do I create it fast as Amazon. I mean the search with auto suggestion from 625K products.
I am really sorry, but there is nothing more to give as an advice here then "go do some reading on database design and schema normalization".
If you have 162 columns in a table you will never be able to do an efficient search. The database (especially MySQL) will not hold the table in memory and indexes will not help either. Yes, you can throw it all into an ElasticSearch instance and it will fix some of your problems. But, honestly, this solution does not clean up the mess you have.
You should have a table with relevant information (titles, names, etc.) in one column (or also a numeric column for prices, etc). This metadata should reference the main table, the column should be fulltext-indexed. This way you ask for matches, filter results and JOIN relevant lines from the main table. This will work quickly with very little resources used.

What is the fastest way to group my records?

My site shows collections of links on different subjects. These links are divided into two types: web and images. My database will have millions (probably more than ten million) of these records. When the page loads, I need to show the user the web and image links for the particular subject of that page. So the first question is:
Do I create two separate, smaller tables, one each for the web and image links, and then make a query to each, or do I create one huge table (with correct indexes) for both and make one query. Where will I get better performance? Should the one table and one query be more efficient, then my next question is:
What would be the most efficient way to subdivide the two types for presentation? Should I use group by, or should I use php to divide my result array into the two types?
TIA!
You can get similar performances using a table for all objects, or one for links or websites. If you have two separate tables, doing a UNION of the results would return all of the results you needed.
The main reason to divide the results is whether they are really different (from your application point of view). That is, if you are going to end up using a lot of queries like
select * from objects where type='image';
then it might make sense to have two tables.
Then using group by is not a way of grouping the different results, it is a way of aggregating them.
So, for instance, you can use
select type, count(*) from objects group by type
to get
| image | 100000 |
| web | 2000000 |
but it will not return the objects separated. To get them "grouped", you can either use a query for each one, or use an ordering and then have the logic in the application to divide the results.
It's possible you'll get slightly better performance from just one table, but this decision should be primarily guided by whether the nature of data or constraints is different or not.
There is another (more important from the performance perspective) decision you'll have to make: how do you want to cluster the data (all InnoDB tables are clustered)?
If you want to have an excellent performance getting all the links of a given page, use an identifying relationship, producing a natural key in the link table(s):
The LINK table is effectively just a single B-tree, with the page PK1 at its leading edge, which physically groups together the rows that belong to the same page. The following query can be satisfied by a simple index range scan and minimal I/O:
SELECT URL
FROM LINK
WHERE PAGE_ID = <whatever>
If you used separate tables, you can just have two different queries. Many client APIs support executing two queries in a single database round-trip. If PHP doesn't, you can UNION the two queries to save one database round-trip:
SELECT *
FROM (
SELECT 1 LINK_TYPE, URL
FROM IMAGE_LINK
WHERE PAGE_ID = <whatever>
UNION ALL
SELECT 2, URL
FROM WEB_LINK
WHERE PAGE_ID = <whatever>
)
ORDER BY LINK_TYPE
The above query will give you...
LINK_TYPE URL
1 http://somesite.com/foo.jpeg
1 http://somesite.com/bar.jpeg
1 http://somesite.com/baz.jpeg
...
2 http://somesite.com/foo.html
2 http://somesite.com/bar.html
2 http://somesite.com/baz.html
...
...which will be very easy to separate at the client level.
If you didn't use separate tables, you can them separate the URLs by their extension at the client level, or introduce an additional field in the LINK PK: {PAGE_ID, LINK_TYPE, URL}, which should make the following query very efficient:
SELECT LINK_TYPE, URL
FROM LINK
WHERE PAGE_ID = <whatever>
ORDER BY LINK_TYPE
Note that the order of fields in the PK matters, so placing the LINK_TYPE at the end would prevent the DBMS from just doing the index range scan.
1 Whatever it may be; I just used the PAGE_ID as an example.
It depends on how web data is close to img data. If data is basically made of the link, one table fits better, having a column to differentiate between web and data (and possibly others later, like css, js ...)
Links: (id, link, type)
adding an index on type or type link will help the grouping (by type), and the matching search by (type, link).
If however, web and img data are different in such a way that you don't want to mix apples and oranges, like
Web: (wid, wlink, rating, ...)
Img: (iid, ilink, width, height, mbsize, camera, datetaken, hasexif...)
in this case, besides the link both tables don't have much in common. Image links and web links being different, there is not even a "gain" when having a same link for both kinds of data. Another advantage (which is also possible with one table, but makes more sense here) is to link both kinds of data in another table
Relations: (wid,iid)
that allows to maintain the relation between web sites and images, since an image may be used by several web sites, and web sites use several images. Indexing on wid and on iid.
My preference goes to the two tables (with optional Relations link).
Regarding queries from PHP, using UNION you can obtain the data from two tables in one query.
Do I create two separate, smaller tables or one huge table?
Go for one table.
What would be the most efficient way to subdivide the two types for presentation?
Depends on the certain search criteria.

How to extract relevant data from MySQL?

I'm using a table named "url2" with tje MySQL InnoDB Engine. I'm having so many data with full HTML of a Page, URL of the page, and so on.... When I use the following SQL query I am getting lot of results:
SELECT url FROM url2 WHERE html LIKE '%Yuva%' OR url LIKE '%Yuva%'
The search term yuva can be changes as user request
It will select lot of data, mostly which I don't need, how can i avoid that?
The out put of the above query is
www.123musiq.com
www.123musiq.com/home.html
www.123musiq.com/yuva.html
www.sensongs.com/
www.sensongs.com/hindi.html
www.sensongs.com/yuva.html
The Output i need is
According to the relevancy it should be sorted Like
www.123musiq.com/yuva.html
www.sensongs.com/yuva.html
www.sensongs.com/hindi.html
As from the comment of my Friend i change table to MyISAM,but i am geting 123musiq.com files first about 25 after that i am geting sensongs.how can i get 2 from 123musiq.com and 2 from sensongs.com,order by relevance
It seems you're asking for a Full Text Index, which in MySQL are only available on MyISAM tables.
Since you're using InnoDB tables, the easiest solution is to create a new (MyISAM) table with only the text content and an index to join with the original table (this also helps with seek efficiency in some common cases).
Perhaps you want to use LIMIT?
SELECT * FROM url2 WHERE html LIKE '%Yuva%' OR url LIKE '%Yuva%' LIMIT 2

Correct way to set up MySQL tables

I can't work out what I should be doing here...
I have a database with around 20,000 records. Each of these records has about 20 columns to it.
I want to add around 20 or so additional columns to this database which would be on the lines of a load of different URLs for each record. Mostly, these will be blank.
What's the "right" way of doing this:
Add 20 additional columns (youtubeurl, facebookurl, etc)
(Benefits: only one URL call // Drawbacks: makes my database much larger)
Add an additional table with three columns - 'ID','URLType','URL' which I can additionally call?
(Benefits: keeps main table much smaller // Drawbacks: additional SQL query required)
What should I be doing?
Everything else being equal, I would go with option (2). This allows you to keep your data normalized and offers flexibility if you need to add more sites in the future.
FWIW, this does not require an extra query to SELECT data, as you can just JOIN to the other table. But of course, it would require extra INSERT / UPDATE queries.
Option 2 is a almost certainly the better option. It makes it easier for you to add new Url types in the future (just invent a new URLType instead of having to create a new column). Pages that use these urls then don't have to be modified to accomodate the new type of URL; they'll just pick it out of the table. In other words, you only have to make a change in one place instead of several.
If people mostly have only a few of these urls, splitting it into a separate table is almost certainly the way to go.
Everything you're adding is a URL. Each URL is related to one (or maybe more) of your current records. So either:
for URLs that have only one record-
urls table with url and FK to records table
or for URLS that can relate to more than one record-
urls table with url_id and url
linking table with record_id and url_id

Showing data and counting from multiple databases in MySQL

I have two tables, that is joined in some way. I've spent the last hour googling around, not finding any concrete answers that worked in my case.
The case is, table 1 is called Clients, and table 2 is called Projects. I need to list all the client names, followed by number of projects related to that project and the project title.
For example:
Client 1 (2 projects)
- Project 1 title
- Project 2 title
Client 2 (0 projects)
Client 3 (1 project)
- Project 1 title
How is this doable, in the simplest and easiest way?
There are basically three parts to your query.
Joining the tables
Grouping the output by project
Counting the projects per client
The first is easy. It involves finding the client ID field in each table and use an JOIN clause specifying the two columns (one in each table) to correlate on. This will give you one row per project that also contains information for the matching client. This is almost what you're asking for.
It is tricky to put the second and third together in the one query and not one I would recommend. If you are going to be putting this in a program, then you can easily post-process the result from the query. To do this, you need to add an ORDER BY clause to specify sorting by client. That will put all the projects for each client in subsequent rows.
Now you can write a loop to process the output. As it does to, it has to watch for two things:
when the client ID changes
counting the projects
By doing this, it can easily display a "group header" for each client, and a "group footer" for the number of projects.
You don't say anything about your app, but I'd recommend separating querying from displaying. Let SQL get the data for you and then let something else do presentation.