select * math operations - mysql

is it possible to select colums and do complex operations on them for example select factorial(column1) from table1 or select integral_of(something) from table2
perhaps there are libraries that support such operations?

Yes, you can call all pre-defined functions of your DB on the select columns and you can use CREATE FUNCTION to define your own.
But DBs are meant to wade through huge amounts of data, not to do complex calculations on them. If you try this, you'll find that many operations are awfully slow (especially the user defined ones).
Which is why most people fetch the data from the database and then do the complex math on the application side. This also makes it more simple to test and optimize the code or replace it with a new version.

Yes, it is. If the function you want is not built into your RDBMS, you can write your own User Defined Functions.
You'll find an example here: http://www.15seconds.com/Issue/000817.htm.

Related

SQL and fuzzy comparison

Let's assume we have a table of People (name, surname, address, SSN, etc).
We want to find all rows that are "very similar" to specified person A.
I would like to implement some kind of fuzzy logic comparation of A and all rows from table People. There will be several fuzzy inference rules working separately on several columns (e.g. 3 fuzzy rules for name, 2 rules on surname, 5 rules on address)
The question is Which of the following 2 approaches would be better and why?
Implement all fuzzy rules as stored procedures and use one heavy SELECT statement to return all rows that are "very similar" to A. This approach may include using soundex, sim metric etc.
Implement one or more simplier SELECT statements, that returns less accurate results, "rather similar" to A, and then fuzzy-compare A with all returned rows (outside database) to get "very similar" rows. So fuzzy comparation would be implemented in my favorit programming language.
Table People should have up to 500k rows, and I would like to make about 500-1000 queries like this a day. I use MySQL (but this is yet to be considered).
I don't really think there is a definitive answer because it depends on information not available in the question. Anyway, too long for a comment.
DBMSes are good at retrieving information according to indexes. It does not make sense to have a db server wasting time in heavy computations unless it is dedicated for this specific purpose (as answered by #Adrian).
Therefore, your client application should delegate to the DBMS the retrieval of information required by the rules.
If the computations are minor, all could be done on the server. Else, pull it off into the client system.
The disadvantage of the second approach lies in the amount of data traveling from the server to the client and the number of connections to establish. So, typically it is a compromise between computation and data transfer in the server. A balance to be achieved depending on the specificities of the fuzzy rules.
Edit: I've seen in a comment that you are almost sure to have to implement the code in the client. In that case, you should consider an additional criterion, code locality, for maintenance purposes, i.e., try to have all code that is related together, not spreading it between systems (and languages).
I would say you're best off using simple selects to get the closest matches you can without hammering the database, then do the heavy lifting in your application layer. The reason I would suggest this solution is scalability: if you do your heavy lifting in the application layer, your problem is a perfect use case for a map-reduce-style solution wherein you can distribute the processing of similarities across nodes and get your results back much faster than if you put it through the database; plus, this way, you're not locking up your database and slowing down any other operations that may be going on at the same time.
Since you're still considering what DB to use PostgreSQL has fuzzystrmatch module which provides Levenshtein and Soundex functions. Also, you might want to look on the pg_trm module as described here. Maybe you could also put the index on the column using soundex() so you won't have to calculate that every time.
But you seem to optimize prematurely so my advice would be to test using pg and then wonder if you need to optimize or not, the numbers you provided really don't seem like a lot considered you almost have two minutes to run one query.
An option i'd consider is to add a column in the "People Talbe" that is the SoundEx value of the person.
I've done joins using
Select [Column}
From People P
Inner join TableA A on Soundex(A.ComarisonColumn) = P.SoundexColumn
That'll return anything in TableA that has the same SoundEx value from the People Tables SoundEx Column.
I haven't used that kind of query on tables that size, but i see no issues with trying it. You can also index that SoundExColumn to help with performance.

How can I store a query in a MySQL column then subsequently use that query?

For example, say I'm creating a table which stores promo codes for a shipping site. I want to have the table match a promo code with the code's validator, e.g.
PROMO1: Order must have 3 items
PROMO2: Order subtotal must be greater than $50
I would like to store the query and/or routine in a column in the table, and be able to use the content to validate, in the sense of
SELECT * FROM Orders
WHERE Promo.ID = 2 AND Promo.Validation = True
Or something to that effect. Any ideas?
I wouldn't save the query in the database, there are far better possibilities.
You have to decide, which best fits your needs (it's not clear for me based on your question). You can use
Views
or
Prepared Statements
or
Stored Procedures
There's probably a better way to solve the issue, but the answer to your question is to write stored procedures that return the results you want. Where I work (and I hate this design), they actually store all queries and dml used by the application in stored procedures.
You can also dynamically build your queries using dynamic sql. For MySql, see the post below which might be of some help to you.
How To have Dynamic SQL in MySQL Stored Procedure
Otherwise, you can also store your queries in a string format in the database, and retrieve them and execute them using the EXECUTE statement, such as that post points out.
I'd personally keep away from designs like that though. Storing queries in XML isn't a bad alternative, and have your app be written to be extensible and configurable from XML, so you don't need to make code changes to add new validation logic, and instead just have to configure it in XML.

Best practises : is sql views really worth it? [duplicate]

This question already has answers here:
Why do you create a View in a database?
(25 answers)
Closed 8 years ago.
I am building a new web applications with data stored in the database. as many web applications, I need to expose data from complexe sql queries (query with conditions from multiple table). I was wondering if it could be a good idea to build my queries in the database as sql view instead of building it in the application? I mean what would be the benefit of that ? database performance? do i will code longer? debug longer?
Thank you
This can not be answered really objectively, since it depends on case by case.
With views, functions, triggers and stored procedures you can move part of the logic of your application into the database layer.
This can have several benefits:
performance -- you might avoid roundtrips of data, and certain treatment are handled more efficiently using the whole range of DBMS features.
consisness -- some treatment of data are expressed more easily with the DBMS features.
But also certain drawback:
portability -- the more you rely on specific features of the DBMS, the less portable the application becomes.
maintenability -- the logic is scattered across two different technologies which implies more skills are needed for maintenance, and local reasoning is harder.
If you stick to the SQL92 standard it's a good trade-off.
My 2 cents.
I think your question is a little bit confusing in what you are trying to achieve (Gain knowledge regarding SQL Views or how to structure your application).
I believe all database logic should be stored at the database tier, ideally in a stored procedure, function rather in the application logic. The application logic should then connect to the database and retrieve the data from these procedures/functions and then expose this data in your application.
One of the the benefits of storing this at the database tier is taking advantage of the execution plans via SQL Server (which you may not get by accessing it directly via the application logic). Another advantage is that you can separate the application, i.e. if the database needs to be changed you don't have to modify the application directly.
For a direct point on Views, the advantages of using them include:
Restrict a user to specific rows in a table.
For example, allow an employee to see only the rows recording his or her work in a labor- tracking table.
Restrict a user to specific columns.
For example, allow employees who do not work in payroll to see the name, office, work phone, and department columns in an employee table, but do not allow them to see any columns with salary information or personal information.
Join columns from multiple tables so that they look like a single table.
http://msdn.microsoft.com/en-us/library/aa214068(v=sql.80).aspx
Personally I prefer views, especially for reports/apps as if there are any problems in the data you only have to update a single view rather than re-building the app or manually editing the queries.
SQL views have many uses. Try first reading about them and then asking a more specific question:
http://odetocode.com/code/299.aspx
http://msdn.microsoft.com/en-us/library/ms187956.aspx
I have seen that views are used a lot to do two things:
Simplify queries, if you have a HUGE select with multiple joins and joins and joins, you can create a view that will have the same performance but the query will be only a couple of lines.
For security reason, if you have a table with some information that shouldn't be accessed for all the developers, you can create views and grant privileges to see the views and not the main table, I.E:
table 1: Name, Last_name, User_ID, credit_card, social_security. You create a view table.table view: name, last_name, user_id .
You can run into performance issues and constraints on the types queries you can run against a view.
Restrictions on what you can do with views.
http://dev.mysql.com/doc/refman/5.6/en/view-restrictions.html
Looks like the big one is that you cannot create an index on the view. This could cause a big performance hit if your final result table is large
This is also a good forum discussing views: http://forums.mysql.com/read.php?100,22967,22967#msg-22967
In my experience a well indexed table, using the right engine, and properly tuned (for example setting an appropriate key_buffer value) can perform better than a view.
Alternatively you could create a trigger that updates a table based on the results of other tables. http://dev.mysql.com/doc/refman/5.6/en/triggers.html
The technic you are saying is called denormalization. Cal Henderson, software engineer from Flickr, openly supports this technic.
In theory JOIN operation is one of the most expensive operations, so it is a good practice to denormalize, since you are transforming n queries with m JOIN in 1 query with m JOIN and n queries that select from a view.
That said, the best way is to test it for yourself. Because what could be incredibly good for Flickr may not be so good for your application.
Moreover, the performance of views may vary a lot from one RBDMS to another. For instance, depending on the RBDMS views can be updated when the orginal table is changed, can have indexes, etc.

Data paging in Linq to sql vs straight sql - which one is better?

Bit of a theoretical question here.
I have made a database search interface for an ASP.NET website. I am using linq to sql for the BL operations. I have been looking at different ways to implement efficient paging and I think I have got the best method now, but I am not sure how great the difference in performance really is and was wondering if any experts have any explanations/advice to give?
METHOD 1:The traditional method I have seen in a lot of tutorials uses pure linq to sql and seems to create one method to get the data. And then one method which returns the count for the pager. I guess this could be grouped in a single method, but essentially the result seems to be that an IQueryable is created which holds the data to the whole query and IQueryable.Count() and IQueryable.Skip().Take() are then performed.
I saw some websites criticising this method because it causes two queries to be evaluated, which apparently is not as efficient as using a stored procedure...
Since I am using fulltext search anyway, I needed to write a SP for my search, so in light of the previous comments I decided to do it paging and counting in the SP. So I got:
METHOD 2: A call to the stored procedure from the BL. In the SP the WHERE clause is assembled according to the fields specified by the user and a dynamic query created. The results from the dynamic query are inserted into a table variable with a temporary identity key on which I perform a COUNT(*) and SELECT WHERE (temp_ID >= x and temp_ID < y).
It looks to me like those two methods are in principle performing the same operations...
I was wondering whether method 2 actually is more efficient than method 1 (regardless of the fact that fulltext is not available in linq to sql...). And why? And by how much?
In my understanding, the SP requires the query to be generated only once, so that should be more efficient. But what other benefits are there?
Are there any other ways to perform efficient paging?
I finally got around to doing some limited benchmarks on this. I've only tested this on a database with 500 entries, because that is what I have to hand so far.
In one case I used a dynamic SQL query with
SELECT *, ROW_COUNT() OVER(...) AS RN ... FROM ... WHERE RN BETWEEN #PageSize * #PageCount AND #PageSize * (#PageCount + 1)
in the other I use the exact same query, but without the ROW_COUNT() AND WHERE ... clause and do a
db.StoredProcedure.ToList().Skip(PageSize * PageCount).Take(PageSize);
in the method.
I tried returning datasets of 10 and 100 items and as far as I can tell the difference in the time it takes is negligible: 0.90s for the stored procedure, 0.89s for the stored procedure.
I also tried adding count methods as you would do if you wanted to make a pager. In the stored procedure this seems to add a very slight overhead (going from 0.89s to 0.92s) from performing a second select on the full set of results. That would probably increase with the size of the dataset.
I added a second call to the Linq to SQL query with a .Count() on it, as you would do if you used had the two methods required ASP.NET paging, and that didn't seem to affect execution speed at all.
These tests probably aren't very meaningful given the small amount of data, but that's the kind of datasets I work with at the moment. You'd probably expect a performance hit in Linq to SQL as the datasets to evaluate become larger...

Is a MySQL view faster than a normal query?

I have a complex query using many joins (8 in fact). I was thinking of simplifying it into a view. After a little research I can see the benefits in simplicity and security. But I didn't see any mentions of speed.
Do views work like prepared statements, where the query is pre-compiled? Is there any notable performance gain from using views?
No, a view is simply a stored text query. You can apply WHERE and ORDER against it, the execution plan will be calculated with those clauses taken into consideration.
A view's basically a stored subquery. There's essentially no difference between:
SELECT *
FROM someview
and
SELECT *
FROM (
SELECT somestuff
FROM underlying table
);
except the view's a bit more portable, as you don't have to write out the underlying query each time you want to work with whatever data it returns.
It can sometimes help, but it's no silver bullet. I've seen a view help performance, but I've also seen it hurt it. A view can force materialization, which can sometimes get you a better access path if MySQL isn't choosing a good one for you.