MySQL slow loading - Pagination - mysql

We are using the latest version of Joomla 3.6.5 and the latest version of J Directory
http://www.cmsjunkie.com/j-businessdirectory
and we are having a bit of a disagreement with the developers.
Our directory has grown over the last year and page loading is now very poor where we have more then 500+ entries per category. J directory are saying its because we have a large number of entries,
Please see below url that is loading slowing, we have around 1100 entries associated with this category.
http://www.shoppingonline.ie/category/clothes-shops-womens
but this should not make any difference right ? There are 1000's of websites online that have huge DB's and that display large amounts of results
See below an example of a classifieds website that shows 66,914 ads for "cars" in Ireland
donedeal.ie/all?words=cars&area=Ireland
So what I am saying to the developers is that it we should not be retrieving all the results only the results that show on the page - currently 20 and then pagination kicks in and you load the next 20 results etc etc.
Can you confirm this is the way it should work right ? Surely the example above of does not retrieve 66,914 as the performance would be very poor indeed if it did, and it currently loads very quickly.
Please see below response from dev.
You must have misunderstand the whole situation.
We do not retrieve all the results at once. As you can see only 20 items are being retrieved at once.
The issue is that mysql has to go through all 1000 results that match the search criteria order them by the order criteria and then get 20 results within the requested page window.
That is causing the performance penalty. This is not a bug. This is a Mysql performance limitation.
As you can see the search is performing well for fewer results and that proves there is no bug or coding issue.
I hope we have clarified the situation now.
The sites that you are referring to must be using other database engines (paid database engines) or indexes like google that are more efficient.

This load time is definitely gonna annoy customers. They don't appear to be limiting the number of results they are retrieving. The link you posted to your page looks like this and takes about 17 seconds to load:
http://www.shoppingonline.ie/category/clothes-shops-womens
If I click page # 2, it takes only 8.8 seconds to load (which is still terrible but better). Notice the url:
http://www.shoppingonline.ie/component/jbusinessdirectory/search/266?Itemid=101&controller=search&categorySearch=266&orderBy=packageOrder desc&limit=20&start=20
The &limit=20 is probably how your developers are telling your websites to retrieve a max of 20 rows from your database, whereas without that, it seems to be grabbing everything. They need to put the same limit on the home page.
It also seems like your traffic is being routed to doubleverify.com and then to your website. A tool like Google Analytics would track your web traffic after your user landed on your page, eliminating the need for them to get bounced all over the web. I'd sit down with your devs and ask them exactly how many websites the user gets bounced around to before landing on your actual page. That stuff is time consuming.
As far as your database, I've never used J Directory but almost all databases give you the ability to put indexes on your database fields. Indexing will greatly reduce the time it takes for database queries to complete.
Your devs also have a LOT of blank lines in their HTML, this is probably doubling or tripling the size of your web page which is not helping it to load quickly. They can still have a readable format without all the extra blank lines.

I have tested that page. It is loading very very slow , taking 16.x seconds.
You needed to hire a expert freelancer which have done lot of work in website speed optimization.

We had a similar performance problem with Mosets Tree - the way to uncover the real problem is to check the MySQL slow query logs (this should be enabled in the my.cnf file). By the way, the fact that they are only displaying 20 records does not mean that they are retrieving just those 20 records. Also, that might not be the problem - the problem might be caused by some very complex queries and/or some queries without proper indexes.
Again, the key to get down to the bottom of the problem is through the slow query log. You can also enable the Joomla debug, and check how much time each query is taking.

Related

Is it safe to query my database for every page load?

Meaning every page in my website queries something from the database.
The query in itself is very small and the time it takes to load is unnoticeable, but I'm wondering if it's okay to do this for every page since I don't really know much about how querying from the database works and whether doing it multiple times, and in my case for every page load, affects anything significantly.
As with all things, the answer is it depends. :-)
Most web sites you visit queries something from a database on every page load. If the queries are crafted well, they look up just the data they need, and avoid scanning through a big database. You might like my presentation How to Design Indexes, Really (video) to help with this.
Another strategy is to use a fast cache in RAM for data that is needed frequently. RAM is thousands of times faster than disk drives. You might like to familiarize yourself with the Numbers Everyone Should Know. Those numbers are just examples, but the intention is to get programmers to think about the fact that moving data around has different cost as you use RAM vs. disk vs.network vs. CPU.
P.S.: Please don't buy into the myth that you're not good at computers because you're a woman. Everyone starts out as a novice, no matter what their gender or background. Only through practice and study do any of us learn this stuff. I recommend seeing Hidden Figures, the story of the women who did pioneering math and programming for NASA.
Another notable woman is Margaret Hamilton, who practically invented the profession of "software engineering."
Yes you are OK to query the database on every page load.
Think about websites like Facebook. When you visit the site it needs to know who you are - it gets that from a database. It needs to know all of the status updates that it's going to show you - it gets that from a database. When you hit the bottom of the news feed and it gets more for you to read - it gets that from a database.
That's normal. Most web applications have to query the database for each page load (usually several times), since most of the page content comes from the database.
If you're concerned about performance, think about this: is the query different for each page? Or is it loading the same data over and over? If it keeps querying the same thing (like the current user's name), you can improve performance by storing the data in the application's session state. But if it's different (like how many unread messages the user has), you'll need to run the query each time.
Imagine visiting a website which has features like 'whois online' or a messaging system, when ever you click on another page the site needs to update the database so that it can keep track of where you are on the site. If you receive a private message it would be accessible on the next page click since the database would have been updated when the message was sent. The trick is to run queries only to perform tasks which is required at that time. For instance if you were looking for a username in the database, if you searched the whole database it will run a lot slower as it needs to search the whole database. If you searched by a particular column it will be faster, it will be even faster if you used things like limits such as LIMIT in the query.

how to implement saved-searches scenario

what is saved-search?
Save is the mechanism users don't find their desired results in advanced search and just push "Save My Search Criteria bottom" and we save the search criteria and when corresponding data post to website we will inform the user "hey user, the item(s) you were looking for exists now come and visit it".
Saved Searches is useful for sites with complex search options, or sites where users may want to revisit or share dynamic sets of search results.
we have advanced search and don't need to implement new search, what we require is a good performance scenario to achieve saved-search mechanism.
we have a website that users post about 120,000 posts per day into the website and we are going to implement SAVED SEARCH scenario(something like this what https://www.gumtree.com/ do), it means users using advanced search but they don't find their desired content and just want to save the search criteria and if there will be any results in the website we inform them with notification.
We are using Elastic search and Mysql in our Website.We still, haven't implement anything and just thinking about it to find good solution which can handle high rate of date, in other hand **the problem is the scale of work, because we have a lot of posts per day and also we guess users use this feature a lot, So we are looking for good scenario which could handle this scale of work easy with high performance.
suggested solutions but not the best
one quick solution is we save the saved-searches in saved-search-index in Elastic then run a cronjob that for all saved-searches items get results from posts-index- Elastic and if there is any result push a record into the RabbitMq to notify the equivalent user.
on user post an item into the website we check it with exists saved-searches in saved-search-index in Elastic and if matched we put a record into the RabbitMq,( the main problem of this method is it could be matched with a huge number of saved-searches in every post inserted into the website).
My big concern is about scale and performance, I'll appreciate sharing your experiences and ideas about this problem with me.
My estimation about the scale
Expire date of saved-search is three month
at least 200,000 Saved-search Per day
So we have 9,000,000 active Records
I'll appreciate if you share your mind with me
*just FYI**
- we also have RabbitMQ for our queue jobs
- our ES servers are good enough with 64GB RAM
Cron job - No. Continual job - yes.
Why? As things scale, or as activity spikes, cron jobs become problematical. If the cron job for 09:00 runs too long, it will compete for resources with the 10:00 instance; this can cascade into a disaster.
At the other side, if a cron job finishes 'early', then the activity oscillates between "busy" (the cron job is doing stuff) and "not busy" (cron has finished, and not time for next invocation).
So, instead, I suggest a job that continually runs through all the "stored queries", doing them one at a time. When it finishes the list, is simply starts over. This completely eliminates my complaints about cron, and provides an automatic "elasticity" to handle busy/not-busy times -- the scan will slow down or speed up accordingly.
When the job finishes, the list, it starts over on the list. That is, it runs 'forever'. (You could use a simple cron job as a 'keep-alive' monitor that restarts it if it crashes.)
OK, "one job" re-searching "one at a time" is probably not best. But I disagree with using a queuing mechanism. Instead, I would have a small number of processes, each acting on some chunk of the stored queries. There are many ways: grab-and-lock; gimme a hundred to work on; modulo N; etc. Each has pros and cons.
Because you are already using Elasticsearch and you have confirmed that you are creating something like Google Alerts, the most straightforward solution would be Elasticsearch Percolator.
From the official documentation, Percolator is useful when:
You run a price alerting platform which allows price-savvy customers to specify a rule like "I am interested in buying a specific electronic gadget and I want to be notified if the price of gadget falls below $X from any vendor within the next month". In this case you can scrape vendor prices, push them into Elasticsearch and use its reverse-search (Percolator) capability to match price movements against customer queries and eventually push the alerts out to the customer once matches are found.
I can't say much when it comes to performance, because you did not provide any example of your queries but mostly because my findings are inconsistent.
According to this post (https://www.elastic.co/blog/elasticsearch-queries-or-term-queries-are-really-fast), Elasticsearch queries should be capable of reaching 30,000 queries/second.
However, this unanswered question (Elasticsearch percolate performance) reported a painfully slow 200 queries/second on a 16 CPU server.
With no additional information I can only guess that the cause is configuration problems, so I think you'll have to try a bunch of different configurations to get the best possible performance.
Good luck!
This answer was written without a true understanding of the implications of a "saved search". I leave it here as discussion of a related problem, but not as a "saved search" solution. -- Rick James
If you are saving only the "query", I don't see a problem. I will assume you are saving both the query and the "resultset"...
One "saved search" per second? 2.4M rows? Simply rerun the search when needed. The system should be able to handle that small a load.
Since the data is changing, the resultset will become outdated soon? How soon? That is, saving the resultset needs to be purged rather quickly. Surely the data is not so static that you can wait a month. Maybe an hour?
Actually saving the resultset and being able to replay it involves (1) complexity in your code, (2) overhead in caching, I/O, etc, etc.
What is the average number of times that the user will look at the same search? Because of the overhead I just mentioned, I suspect the average number of times needs to be more than 2 to justify the overhead.
Bottomline... This smells like "premature optimization". I recommend
Build the site without saving resultsets.
Stress test it to see when it will break.
Work on optimizing the slow parts.
As for RabbitMQ -- "Don't queue it, just do it". The cost of queuing and dequeuing is (1) increased latency for the user and (2) increased overhead on system. The benefit (at your medium scale) is minimal.
If you do hit scaling problems, consider
Move clients off to another server -- away from the database. This will give you some scaling, but not 2x. To go farther...
Use replication: One Master + many readonly Slaves -- and do the queries on the Slaves. This gives you virtually unlimited scaling in the database.
Have multiple web servers -- virtually unlimited scaling in this part.
I don't understand why you want to use saved-search... First: you should optimize service, so as to use as little as possible the saved-search.
Have you done anything with the ES server? (What can you afford), so:
Have you optimized elasticearch server? By default, it uses 1GB of RAM. The best solution is to give him half the machine RAM, but no more than 16GB (if I'm remember. Check doc)
How powerful is the ES machine? He likes core instead of MHZ.
How many ES nodes do you have? You can always add another machine to get the results faster.
In my case (ES 2.4), the server slows down after a few days, so I restart it once a day.
And next:
Why do you want to fire up tasks every half hour? If you already use cron, fire then every minute, and you indicate that the query is running. With the other the post you have a better solution and an explanation.
Why do you separate the result from the query?
Remember to standardize the query to change the order of the parameters, not to force a new query.
Why do you want to use MySQL to store results? The better document-type database, like Elasticsearch xD.
I propose you:
Optimize ES structure - choose right tokenisers for fields.
Use asynchronous loading of results - eg WebSocket + Node.js

looking for something similar to caching

I have a query that takes about a minute to complete since it deals with a lot of data, but I also want to put the results on a website. The obvious conclusion is to cache it (right?) but the data changes as time goes by and I need a way to automatically remake the cached page maybe every 24 hours.
can someone point me to how to do this?
edit: I want to make a "top 10" type of thing so it's not displaying the page that is the problem but the amount of time it takes for the query to run.
Caching the results of the query with a 24hr TTL (expiry) would probably work fine. Use a fragment cache assuming this is a chunk of the page.
You can setup memcached or redis as stated to store the cache. Another thing you can do is setup a job that warms the cache every 24 hrs (or as desired) so that unlucky user doesn't have to generate the cache for you.
If you know when the cache is expired base on a state or change in your database you can expire the cache based on that. A lot of times I use the created at or updated at fields as part of the cache key to assist in this process.
There is some good stuff in the scaling rails screencasts by envy labs and new relic. http://railslab.newrelic.com/scaling-rails, a little out of date but the principles are still the same.
Also, checkout the caching rails guides. http://guides.rubyonrails.org/caching_with_rails.html
Finally, make sure indexes are setup properly, use thoughtbots post here: http://robots.thoughtbot.com/post/163627511/a-grand-piano-for-your-violin
Typed on my phone so apologies for typos.
Think a little beyond the query. If your goal is to allow the user to view a lot of data, then grab that data as they want it rather than fighting with a monsterous query that's going to overwhelm your UI. The result not only looks better, but is much, much quicker.
My personal trick for this pattern is DataTables. It's a grid that allows you to use Ajaxed queries (which is built in) to get data from your query a "chunk" at a time that the user wants to see. It can sort, page, filter, limit, and even search with some simple additions to the code. It even has a plug-in to export results to excel, pdf, etc.
The biggest thing that Datatables has that others don't is a concept called "pipelining" which allows you to get an amount to show (say 20) plus an additional amount forward and/or backwards. This allows you to still do manageable queries, but not to have to hit the database each time the user hits "next page"
I've got an app dealing with millions of records. One query of all data would be impossible....it would just take too long. Grabbing 25 at a time, however, is lightning fast, no tricks required. Once the datatable was up, I just performance tuned my query, did some indexing where needed, and voila.....great, responsive app.
Here's a simple example:
<table id="example"></table>
$('#example').dataTable( {
"bProcessing": true,
"bServerSide": true,
"sAjaxSource": "/processing/file.php"
} );
Use a cache store that allows auto-expiration after a certain length of time.
Memcached does it, Redis too I guess !

simple mysql performance question

I am building a very simple classified site.
There is a form that puts data in mysql table.
Now how should this data be displayed ? Is it better to build html pages from the data in a table , and then display it to the users OR is it better to, fetch the data from the mysql table each time a user wants to see the data ?
I hope I was clear!
Performance-wise, it's generally better to keep the static versions of the HTML pages.
However, you may have too many dynamic content which can bloat your disk space, and you should apply some extra effort to track cache expiration (which can be even more expensive than generating the content dynamically).
It's a matter of tradeoff, and to make any other advices we would need to know the nature of your data.
If it's a blog with content updated rarely but read often, it's better to cache.
If it's a product search engine with mostly unique queries and changing stock, it's better to always query the database.
Note that MySQL implements query cache: it can cache the resultsets of the queries and if the query is repeated verbatim and no underlying tables were changed since the last query, then it's served out of the cache.
It tracks the cache expiration automatically, saves you of the need to keep the files on the disk and generally combines the benefits of both methods.
You can use Php caching techniques if the data would not change frequently. Keep loading the cached contents for frequent visits.
http://www.developertutorials.com/tutorials/php/php-caching-1370/
Use both, via a caching mechanism. Based on parameters, the page would be re-rendered (has not been viewed in X time or at all) or displayed from cache otherwise.
As stated though, it depends heavily on the amount of and frequency with which the data is accessed. More information would warrant a more detailed response.
It depends on a few things. Ask yourself two easy questions:
1) How often does the content change? Are your classified ads static or are they changing a lot on the page. How much control do you want on that page to have rotating ads, comments from users, reviews etc.
2) Are you going to be VERY high traffic? So much so that you are looking at a bottleneck at the database?
If you can say "Yes, no doubts, tomorrow" to question #2, go static. even it means adding other things in via ajax or non database calls (ie includes) in order to make the page pseudo-dynamic.
Otherwise if you say "Yes" to question #1, go with a dynamic page so you have the freshest content at all times. These days users have gotten very used to instant gratification on posts and such. Gone are the days we would wait for hours for a comment to appear in a thread (I am looking at you Slashdot).
Hope this helps!
Start with the simplest possible solution that satisfies the requirements, and go from there.
If you implement a cache but didn't need one, you have wasted time (and/or money). You could have implemented features instead. Also, now you (might) have to deal with the cache everytime you add features.
If you don't implement a cache and realize you need one, you are now in a very good position to implement a smart one, because now you know exactly what needs to be cached.

Can MYSQL queries dramaticly slow my website?

Hey, I'm doing a website that requires me to use about 5 mysql "SELECT * FROM" queries on each site and I'd like to know if it can slow the download speed in any way.
Yes.
Here are some useful links to help you understand how to measure MySQL performance and make changes to improve it:
Linux Mag MySQL Tuning article
MySQL Docs (Memory Use)
MySQL Performance Cookbook
MySQL will have no impact on the download speed (i.e., the time it takes for HTML content to get from your server to a visitor's web browser). However, they may create a delay between when your server gets the request and when it can send that HTML. Here's the sequence of events:
Visitor sends a request: "Please send me example.com/some-page"
Your server does some work to generate what some-page is supposed to look like and produce appropriate HTML
Your server sends that page to the visitor
MySQL doesn't affect #1 or #3, but of course it's a key part of what's happening in #2.
The big question is: how much of an impact will it have. If your five SELECT queries are each selecting one row from a table with only a hundred rows, the total impact on performance will be negligible.
If, on the other hand, each query is doing complex JOINs and subqueries on large tables, you could easily notice a difference.
The easiest way to get a sense of this impact is to connect directly to your MySQL server (i.e., not through your PHP script) and run those queries to see how long they take. If they're running slowly, you can always come back to StackOverflow for advice on how to make a particular query run more efficiently.
Sure. Especially if the tables contain lots or rows.
If you have queries that take a long time, then the page will appear to take longer to load. Once the server has finished creating the HTML to send to the client (where the queries happen), the download speed will depend on how big the content of the page is.