Parameters in MySQL stored procedure - mysql

I want to create a MySQL stored procedure (SP) with input parameters.
However, the number of parameters cannot be determined at the time of writing the SP.
(The scenario is that the users will have multiple options to choose. The options chosen will form the search criteria:
select ...
where prod_category = option1 && option2 && option3 &&...
So, if someone chooses only option1 and option2, only 2 parameters will be sent. Sometimes it may be 50+ options are chosen and hence 50+ parameters will have to be sent.)
So, I have 3 questions:
1. Can I handle such a scenario using MySQL stored procedures (SP)?
2. Is the SP the professional way to handle such scenario?
3. If SP is not the professional way to handle these scenarios, is there anything else that will handle these searches efficiently? The search is the core functionality of my application.
Thanks in advance for any help!

MySQL stored procedures accept only a fixed number of arguments. You can build your list of parameters and values delimited on a single string parameter and then process them on your procedure, or use your application language to build the query instead.
From http://forums.mysql.com/read.php?98,154749,155001#msg-155001
No, MySQL sprocs accept only a fixed number of arguments. ISO SQL is
somewhat optimised for correct RDBMS logic (unless you were to ask EF
Codd, CJ Date or Fabian Pascal), but in many ways SQL is quite
literal, so if SQL seems to make what you are trying to do very
difficult, it could be that your design needs another look, in this
case aspects of the design that require repeated multiple ad hoc
deletions.
If such deletions are unavoidable, here are three options. First, in
an application language build the delete query to include
comma-delimited string of IDs. Second, pass such a string to an sproc
that PREPAREs the query using such a string. Third, populate a temp
table with the target IDs and write a simple join query that deletes
the joined IDs.

There are lots of great reasons to use stored procedures. Here's an article that lists some. Hopefully that will address the "professionalism" question.
As for the passing of parameters, I don't believe you can have a variable list.
A long time ago, I saw it "done" by writing the values to a table and having the stored procedure read them back in. (Use a session_id in the table and then pass that to the procedure).
As for "efficiency", it depends on your definition. There might be a slight speed benefit to the stored procedures, but I wouldn't worry about that. What did you mean?

Related

MySQL - Best methods to provide fast Dynamic filter support for large-scale database record lists?

I am curious what techniques Database Developers and Architects use to create dynamic filter data response Stored Procedures (or Functions) for large-scale databases.
For example, let's take a database with millions of people in it, and we want to provide a stored procedure "get-person-list" which takes a JSON parameter. Within this JSON parameter, we can define filters such as $.filter.name.first, $.filter.name.last, $.filter.phone.number, $.filter.address.city, etc.
The frontend (web solution) allows the user to define one or more filters, so the front-end can say "Show me everyone with a First name of Ted and last name of Smith in San Diego."
The payload would look like this:
{
"filter": {
"name": {
"last": "smith",
"first": "ted"
},
"address": {
"city": "san diego"
}
}
}
Now, what would the best technique be to write a single stored procedure capable of handling numerous (dozens or more) filter settings (dynamically) and returning the proper result set all with the best optimization/speed?
Is it possible to do this with CTE, or are prepared statements based on IF/THEN logic (building out the SQL to be executed based on filter value) the best/only real method?
How do big companies with huge databases and thousands of users write their calls to return complex dynamic lists of data as quickly as possible?
Everything Bill wrote is true, and good advice.
I'll take it a little further. You're proposing building a search layer into your system, which is fine.
You're proposing an interface in which you pass a JSON object to code inside the DBMS.That's not fine. That code will either have a bunch of canned queries handling the various search scenarios, or will have a mess of string-handling code that reads JSON, puts together appropriate queries, then uses MySQL's PREPARE statement to run them. From my experience that is, with respect, a really bad idea.
Here's why:
The stored-procedure language has very weak string-handling support compared to host languages. No sprintf. No arrays of strings. No join or implode operators. Clunky regex, and not always present on every server. You're going to need string handling to build search queries.
Stored procedures are trickier to debug, test, deploy, and maintain than ordinary application code. That work requires special skills and special access.
You will need to maintain this code, especially if your system proves successful. You'll add requirements that will require expanding your search capabilities.
It's impossible (seriously, impossible) to know what your actual application usage patterns will be at scale. You surely will, as a consequence of growth, find usage patterns that surprise you. My point is that you can't design and build a search system and then forget about it. It will evolve along with your app.
To keep up with evolving usage patterns, you'll need to refactor some queries and add some indexes. You will be under pressure when you do that work: People will be complaining about performance. See points 1 and 2 above.
MySQL / MariaDB's stored procedures aren't compiled with an optimizing compiler, unlike Oracle and SQL Server's. So there's no compelling performance win.
So don't use a stored procedure for this. Please. Ask me how I know this sometime.
If you need a search module with a JSON interface, implement it in your favorite language (php, C#, nodejs, java, whatever). It will be easier to debug, test, deploy, and maintain.
To write a query that searches a variety of columns, you would have to write dynamic SQL. That is, write code to parse your JSON payload for the filter keys and values, and format SQL expressions in a string that is part of a dynamic SQL statement. Then prepare and execute that string.
In general, you can't "optimize for everything." Trying to optimize when you don't know in advance which queries your users will submit is a nigh-impossible task. There's no perfect solution.
The most common method of optimizing search is to create indexes. But you need to know the types of search in advance to create indexes. You need to know which columns will be included, and which types of search operations will be used, because the column order in an index affects optimization.
For N columns, there are N-factorial permutations of columns, but clearly this is impractical because MySQL only allows 64 indexes per table. You simply can't create all the indexes needed to optimize every possible query your users attempt.
The alternative is to optimize queries partially, by indexing a few combinations of columns, and hope that these help the users' most common queries. Use application logs to determine what the most common queries are.
There are other types of indexes. You could use fulltext indexing, either the implementation built in to MySQL, or else supplement your MySQL database with ElasticSearch or similar technology. These provide a different type of index that effectively indexes everything with one index, so you can search based on multiple columns.
There's no single product that is "best." Which fulltext indexing technology meets your needs requires you to evaluate different products. This is some of the unglamorous work of software development — testing, benchmarking, and matching product features to your application requirements. There are few types of work that I enjoy less. It's a toss-up between this and resolving git merge conflicts.
It's also more work to manage copies of data in multiple datastores, making sure data changes in your SQL database are also copied into the fulltext search index. This involves techniques like ETL (extract, transform, load) and CDC (change data capture).
But you asked how big companies with huge databases do this, and this is how.
Input
I to that "all the time". The web page has a <form>. When submitted, I look for fields of that form that were filled in, then build
WHERE this = "..."
AND that = "..."
into the suitable SELECT statement.
Note: I leave out any fields that were not specified in the form; I make sure to escape the strings.
I'm walking through $_GET[] instead of JSON, so it is quite easy.
INDEXing
If you have columns for each possible fields, then it is a matter of providing indexes only for the most likely columns to search on. (There are practical and even hard-coded limits on Indexes.)
If you have stored the attributes in EAV table structure, you have my condolences. Search the [entitity-attribute-value] tag for many other poor soles who wandered into that swamp.
If you store the attributes in JSON, well that is likely to be an order of magnitude worse than EAV.
If you throw all the information in a FULLTEXT columns and use MATCH, then you can get enough speed for "millions" or rows. But it comes with various caveats (word length, stoplist, endings, surprise matches, etc).
If you would like to discuss further, then scale back your expectations and make a list of likely search keys. We can then discuss what technique might be best.

How can I store a query in a MySQL column then subsequently use that query?

For example, say I'm creating a table which stores promo codes for a shipping site. I want to have the table match a promo code with the code's validator, e.g.
PROMO1: Order must have 3 items
PROMO2: Order subtotal must be greater than $50
I would like to store the query and/or routine in a column in the table, and be able to use the content to validate, in the sense of
SELECT * FROM Orders
WHERE Promo.ID = 2 AND Promo.Validation = True
Or something to that effect. Any ideas?
I wouldn't save the query in the database, there are far better possibilities.
You have to decide, which best fits your needs (it's not clear for me based on your question). You can use
Views
or
Prepared Statements
or
Stored Procedures
There's probably a better way to solve the issue, but the answer to your question is to write stored procedures that return the results you want. Where I work (and I hate this design), they actually store all queries and dml used by the application in stored procedures.
You can also dynamically build your queries using dynamic sql. For MySql, see the post below which might be of some help to you.
How To have Dynamic SQL in MySQL Stored Procedure
Otherwise, you can also store your queries in a string format in the database, and retrieve them and execute them using the EXECUTE statement, such as that post points out.
I'd personally keep away from designs like that though. Storing queries in XML isn't a bad alternative, and have your app be written to be extensible and configurable from XML, so you don't need to make code changes to add new validation logic, and instead just have to configure it in XML.

Data paging in Linq to sql vs straight sql - which one is better?

Bit of a theoretical question here.
I have made a database search interface for an ASP.NET website. I am using linq to sql for the BL operations. I have been looking at different ways to implement efficient paging and I think I have got the best method now, but I am not sure how great the difference in performance really is and was wondering if any experts have any explanations/advice to give?
METHOD 1:The traditional method I have seen in a lot of tutorials uses pure linq to sql and seems to create one method to get the data. And then one method which returns the count for the pager. I guess this could be grouped in a single method, but essentially the result seems to be that an IQueryable is created which holds the data to the whole query and IQueryable.Count() and IQueryable.Skip().Take() are then performed.
I saw some websites criticising this method because it causes two queries to be evaluated, which apparently is not as efficient as using a stored procedure...
Since I am using fulltext search anyway, I needed to write a SP for my search, so in light of the previous comments I decided to do it paging and counting in the SP. So I got:
METHOD 2: A call to the stored procedure from the BL. In the SP the WHERE clause is assembled according to the fields specified by the user and a dynamic query created. The results from the dynamic query are inserted into a table variable with a temporary identity key on which I perform a COUNT(*) and SELECT WHERE (temp_ID >= x and temp_ID < y).
It looks to me like those two methods are in principle performing the same operations...
I was wondering whether method 2 actually is more efficient than method 1 (regardless of the fact that fulltext is not available in linq to sql...). And why? And by how much?
In my understanding, the SP requires the query to be generated only once, so that should be more efficient. But what other benefits are there?
Are there any other ways to perform efficient paging?
I finally got around to doing some limited benchmarks on this. I've only tested this on a database with 500 entries, because that is what I have to hand so far.
In one case I used a dynamic SQL query with
SELECT *, ROW_COUNT() OVER(...) AS RN ... FROM ... WHERE RN BETWEEN #PageSize * #PageCount AND #PageSize * (#PageCount + 1)
in the other I use the exact same query, but without the ROW_COUNT() AND WHERE ... clause and do a
db.StoredProcedure.ToList().Skip(PageSize * PageCount).Take(PageSize);
in the method.
I tried returning datasets of 10 and 100 items and as far as I can tell the difference in the time it takes is negligible: 0.90s for the stored procedure, 0.89s for the stored procedure.
I also tried adding count methods as you would do if you wanted to make a pager. In the stored procedure this seems to add a very slight overhead (going from 0.89s to 0.92s) from performing a second select on the full set of results. That would probably increase with the size of the dataset.
I added a second call to the Linq to SQL query with a .Count() on it, as you would do if you used had the two methods required ASP.NET paging, and that didn't seem to affect execution speed at all.
These tests probably aren't very meaningful given the small amount of data, but that's the kind of datasets I work with at the moment. You'd probably expect a performance hit in Linq to SQL as the datasets to evaluate become larger...

Is there an efficient string matching algorithm in MySQL?

Is there an implementation of a fast string matching algorithm for searching keywords in MySQL? For example Aho-Corasick or any other fast string matching algorithm.
Typically Aho-Corasick is implemented in Java or any other compiled language but it should be possible to write it as a stored procedure in MySQL.
Thanks!
As stored procedures are turing-complete, and you can use a "cursor" to loop through the records in a table (possibly with some existing "WHERE" cause), then you can do it in a stored procedure.
A stored function would also be possible.
However, the MySQL stored-routine language is so terrible both in terms of programmer-usability and performance, that the result is unlikely to be easy or fast.
So you might be better off writing a MySQL UDF (which you can write in any language, provided you can make it look like a C library) and having that do it instead.
Consider your specific requirements. I am assuming that a query with lots of "OR col LIKE ..." tagged together is too inefficient for you, as you wish to match thousands of patterns at once, right?

MySQL: Views vs Stored Procedures

Since MySQL started supporting stored procedures, I've never really used them. Partly because I'm not a great query writer, partly because I often work with DBAs who make those choices for me, partly because I'm just comfy with What I Know.
In terms of doing data selection, specifically when considering a select that is essentially a de-normalization (joins) and aggregate (avg or max, subqueries w/counts, etc) selection of data, what is the right choice in MySQL 5.x? A view? Or a stored procedure?
Views I'm comfortable with - you know what your SELECT query is supposed to look like so you just create that, make sure it indexed and whatnot, then just do a CREATE VIEW [View] AS SELECT [...]. Then, in my application, I treat the view as a read-only table - it represents a de-normalized version of my normalized data.
What are the disadvantages here - if any? And what would change (gains or losses) if I moved that exact same SELECT statement into a stored procedure?
I'm hoping to find some good 'under the hood' info that has been difficult to find while googling this topic but really I welcome all comments and answers.
In my opinion, Stored Procedures should be used solely for data manipulation when the same routine needs to be used amongst several different application or for ETL between databases or tables, nothing more. Basically, do as much in code as you can until you run into the DRY principle or what you are doing is simply moving data from one place to another within the DB.
Views can be used to provide an alternate or simplified "view" into the data. As such, I would go with a view as you are not really manipulating the data as much as finding a different method of displaying it.
Not sure if it's an either/or choice. Stored procedures can do a wide variety of things that views would struggle (think populating data in temp table then running cursor on it and then doing aggregation and returning a result set).
Views on the other hand can hide complex sql / access rights and present a modified view of the schema.
I think both have a place in the scheme of things and both are useful for a successful schema implementation.
I use views for de-normalisation or output formatting and stored procedures for filtering and data manipulation (things that require parameter inputs) or iteration (cursors).
I often access a view inside a stored procedure when both de-normalisation and filtering are required.
One thing to note, at least with mysql view results are stored in a temporary table and unlike most decent database engines this table is not indexed, so if using to just simplify queries, view are great when your program is going to grab all of the results from the view, however if your then searching the results of that view, based on parameters it is incredibly slow especially if there are millions of records to sift through and even worse if the view is built on top of other views and so on.
A stored procedure however you can pass those search parameters in and run the query directly against the underlining (indexed) tables. the downside is the results will need to be fetched every time the procedure is run, which may also occur with a view anyway depending on server configuration.
so basically if your using a view try to minimise the number of results (if you then need to search it) else use a stored procedure.