Can MySQL create a result set by inserting members from a loop? - mysql

MySQL can do while-loops.
Can it do something like this
result_set = <EMPTY SET>
while (condition)
SELECT foo INTO bar;
if is_candidate
then
add (bar, baz) to result_set
end while
return result_set
Is this possible? If so how portable is it?
Context, since I'm sure people will say I'm doing it wrong:
the declarative way would be to SELECT whatever FROM table WHERE predicate(whatever) but in my case predicate has O(n) running time where n is the size of the table
so the overall query has quadratic running time,
but searching from known result rows to connected rows using is_candidate is O(1) and so my pseudocode above would be linear-time overall
(No, this data set isn't ideal for SQL, but it's what I've got to work with.)

Related

Best way in Doctrine to load only entities which have attached entities

I have an entity, let's call it Foo and a second one Bar
Foo can (but doesn't have to) have one or multiple Bar entries assigned. It looks something like this:
/**
* #ORM\OneToMany(targetEntity="Bar", mappedBy="foo")
* #ORM\OrderBy({"name" = "ASC"})
*/
private $bars;
I now would like to load in one case only Foo entities that have at least one Bar entity assigned. Previously, there was one foreach loop to traverse all Foo entries and if it had assigned entries, the Foo entry got assigned to an array.
My current implementation is in the FooRepository a function called findIfTheresBar which looks like this:
$qb = $this->_em->createQueryBuilder()
->select('e')
->from($this->_entityName, 'e')
/* some where stuff here */
->addOrderBy('e.name', 'ASC')
->join('e.bars', 'b')
->groupBy('e.id');
Is this the correct way to load such entries? Is there a better (faster) way? It kind of feels as if it should have a having(...) in the query.
EDIT:
I've investigated it a little further. The query should return 373 out of 437 entries.
Version 1: only using join(), this loaded 373 entries in 7.88ms
Version 2: using join() and having(), this loaded 373 entries in 8.91ms
Version 3: only using leftJoin(), this loaded all 437 entries (which isn't desired) in 8.05ms
Version 4: using leftJoin() and having(), this loaded 373 entries in 8.14ms
Since Version 1 which only uses an innerJoin as #Chausser pointed out, is the fastest, I will stick to that one.
Note: I'm not saying Version 1 will be the fastest in all scenarios and on every hardware, so kind of a follow up question, does anybody know about a performance comparison?
Please take a look at this answer for more information on how SQL JOINs work: https://stackoverflow.com/a/16598900/1307183
Using a join, which is an alias of innerJoin, is exactly what you want. This only returns records where entries exist in both Foo and Bar - aka where the association/attached entity exists. This calls INNER JOIN in SQL, which, if your database structure is defined correctly, is the absolute best and fastest way to get the data you want.
Using a leftJoin calls LEFT JOIN in SQL, which returns all records from Foo, even if there is no Bar associated with it (for example, where bar_id in your foo table would be null).
You have no reason to use having() in any of the above scenarios you described. If you want to filter further you would do that with a ->addWhere() function. Using the having() clause is something you would only want to do if you were selecting aggregate data in your original query (like SELECT SUM(field) AS sum_field).

MySQL Add Column that Summarizes data from Another Column

I have a column in MySQL table which has 'messy' data stored as text like this:
**SIZE**
2
2-5
6-25
2-10
26-100
48
50
I want to create a new column "RevTextSize" that rewrites the data in this column to a pre-defined range of values.
If Size=2, then "RevTextSize"= "1-5"
If Size=2-5, then "RevTextSize"= "1-5"
If Size=6-25, then "RevTextSize"="6-25"
...
This is easy to do in Excel, SPSS and other such tools, but how can I do it in the MySQL table?
You can add a column like this:
ALTER TABLE messy_data ADD revtextsize VARCHAR(30);
To populate the column:
UPDATE messy_data
SET revtextsize
= CASE
WHEN size = '2' THEN '1-5'
WHEN size = '2-5' THEN '1-5'
WHEN size = '6-25' THEN '6-25'
ELSE size
END
This is a brute-force approach, identifying each distinct value of size and specifying a replacement.
You could use another SQL statement to help you build the CASE expression
SELECT CONCAT(' WHEN size = ''',d.size,''' THEN ''',d.size,'''') AS stmt
FROM messy_data d
GROUP BY d.size
Save the result from that into your favorite SQL text editor, and hack away at the replacement values. That would speed up the creation of the CASE expression for the statement you need to run to set the revtextsize column (the first statement).
If you want to build something "smarter", that dynamically evaluates the contents of size and makes an intelligent choice, that would be more involved. If was going to do that, I'd do it in the second statement, generating the CASE expression. I'd prefer to review that, befor I run the update statement. I prefer to have the update statement doing something that's easy to understand and easy to explain what it's doing.
Use InStr() to locate "-" in your string and use SUBSTRING(str, pos, len) to get start & End number. Then Use Between clause to build your Case clause.
Hope this will help in building your solution.
Thanks

MySQL Query Tuning - Why is using a value from a variable so much slower than using a literal?

UPDATE: I've answered this myself below.
I'm trying to fix a performance issue in a MySQL query. What I think I'm seeing, is that assigning the result of a function to a variable, and then running a SELECT with a compare against that variable is relatively slow.
If for testings sake however, I replace the compare to the variable with a compare to the string literal equivalent of what I know that function will return (for a given scenario), then the query runs much faster.
For example:
...
SET #metaphone_val := double_metaphone(p_parameter)); -- double metaphone is user defined
SELECT
SQL_CALC_FOUND_ROWS
t.col1,
t.col2,
...
FROM table t
WHERE
t.pre_set_metaphone_string = #metaphone_val -- OPTION A
t.pre_set_metaphone_string = 'PRN' -- OPTION B (Literal function return value for a given name)
If I use the line in option A, the query is slow.
If I use the line in option B, then the query is fast as you would expect any simple string compare to be.
Why?
Was finished writing the question when the answer hit me, so posting anyway for knowledge sharing!
I realised that the return value of the metaphone function was UTF8.
The compare to a latin1 field was obviously incurring a fairly heavy performance overhead.
I replaced the variable assignment with:
SET #metaphone_val:= CONVERT(double_metaphone(p_parameter) USING latin1);
Now the query runs as fast as I would expect.

SQL select everything with arbitrary IN clause

This will sound silly, but trust me it is for a good (i.e. over-engineered) cause.
Is it possible to write a SQL query using an IN clause which selects everything in that table without knowing anything about the table? Keep in mind this would mean you can't use a subquery that references the table.
In other words I would like to find a statement to replace "SOMETHING" in the following query:
SELECT * FROM table_a WHERE table_a.id IN (SOMETHING)
so that the results are identical to:
SELECT * FROM table_a
by doing nothing beyond changing the value of "SOMETHING"
To satisfy the curious I'll share the reason for the question.
1) I have a FactoryObject abstract class which grants all models that extend it some glorious factory method magic using two template methods: getData() and load()
2) Models must implement the template methods. getData is a static method that accepts ID constraints, pulls rows from the database, and returns a set of associative arrays. load is not static, accepts an associative array, and populates the object based on that array.
3) The non-abstract part of FactoryObject implements a getObject() and a getObjects() method. These call getData, create objects, and loads() the array responses from getData to create and return populated objects.
getObjects() requires ID constraints as an input, either in the form of a list or in the form of a subquery, which are then passed to getData(). I wanted to make it possible to pass in no ID constraints to get all objects.
The problem is that only the models know about their tables. getObjects() is implemented at a higher level and so it doesn't know what to pass getData(), unless there was a universal "return everything" clause for IN.
There are other solutions. I can modify the API to require getData to accept a special parameter and return everything, or I can implement a static getAll[ModelName]s() method at the model level which calls:
static function getAllModelObjects() {
return getObjects("select [model].id from [model]");
}
This is reasonable and may fit the architecture anyway, but I was curious so I thought I would ask!
Works on SQL Server:
SELECT * FROM table_a WHERE table_a.id IN (table_a.id)
Okay, I hate saying no so I had to come up with another solution for you.
Since mysql is opensource you can get the source and incorporate a new feature that understands the infinity symbol. Then you just need to get the mysql community to buy into the usefulness of this feature (steer the conversation away from security as much as possible in your attempts to do so), and then get your company to upgrade their dbms to the new version once this feature has been implemented.
Problem solved.
The answer is simple. The workaround is to add some criteria like these:
# to query on a number column
AND (-1 in (-1) OR sample_table.sample_column in (-1))
# or to query on a string column
AND ('%' in ('%') OR sample_table.sample_column in ('%'))
Therefore, in your example, two following queries should return the same result as soon as you pass -1 as the parameter value.
SELECT * FROM table_a;
SELECT * FROM table_a WHERE (-1 in (-1) OR table_a.id in (-1));
And whenever you want to filter something out, you can pass it as a parameter. For example, in the following query, the records with id of 1, 2 and 6 are filtered.
SELECT * FROM table_a WHERE (-1 in (1, 2, 6) OR table_a.id in (1, 2, 6));
In this case, we have a default value like -1 or % and we have a parameter that can be anything. If the parameter is the default value, nothing is filtered.
I suggest % character as the default value if you are querying over a text column or -1 if you are querying over the PK of the table. But it totally depends to you to substitute % or -1 with any reserved character or number that you decide on.
similiar to #brandonmoore:
select * from table_a where table_a.id not in ('0')
How about:
select * from table_a where table_a.id not ine ('somevaluethatwouldneverpossiblyexistintable_a.id')
EDIT:
As much as I would like to continue thinking of a way to solve your problem, I know there isn't a way to so I figure I'll go ahead and be the first person to tell you so I can at least get credit for the answer. It's truly a bittersweet victory though :/
If you provide more info though maybe I or someone else can help you think of another workaround.

Union (or Concat, etc..) with Constant values and projection

I've discovered a very nasty gotcha with Linq-to-sql, and i'm not sure what the best solution is.
If you take a simple L2S Union statement, and include L2S code in one side, and constants in the other, then the constants do not get included in the SQL Union and are only projected into the output after the SQL, resulting in SQL errors about the number of columns not mathching for the union.
As an example:
(from d in dc.mytable where foo == "bar" select new {First = d.Foo, Second = d.Roo})
.Union(from e in dc.mytable where foo == "roo" select new {First= "", Second = e.Roo})
This will generate an error "All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.
This is particularly insidious (and maddening) because there are obviously the same number of expressions in the list, but when you look at the SQL, you will notice that it does not generate a column for "First" in the second half of the Union. This is because "First" is inserted into the projection AFTER the query.
Ok, the easy solution is to just convert each part into Enumerables or Lists or something and then do the union in memory rather than SQL, and that's fine if you're dealing with a small amount of data. However, if you're working with a large set of data, which you then plan to further filter (in sql) before returning it this is not ideal.
I guess what i'm looking for is a way to force L2S to include the column in the SQL. Is that possible?
UPDATE:
While not an exact duplicate, this error is similar to This Question and has similar solutions. So i'm closing, but not deleting this question because it may help someone else come to posible solutions from a different way.
Unfortunately, L2S is too smart of it's own good sometimes.
I've decided that the only real solution is to use a stored proc. Hope this helps.
This is a bug in the Linq2SQL provider.
In LinqPad you can clearly see the bug.
(from d in dc.mytable where foo == "bar" select new {First = d.Foo, Second = d.Roo})
.Union(from e in dc.mytable where foo == "roo" select new {First= "", Second = e.Roo})
Will server side produce something like this:
SELECT [t2].[Foo], [t2].[Roo]
FROM (
SELECT [t0].[Foo], #p0 AS [value]
FROM [dc].[Mytable] AS [t0]
UNION ALL
SELECT [t1].[Foo], [t1].[Roo]
FROM [dc].[Mytable] AS [t1]
) AS [t2]
This will be a problem because the union will name the second column "value" instead of "Roo", which will cause the outer query to fail.
If you, however, switch the order of the two tables
(from e in dc.mytable where foo == "roo" select new {First= "", Second = e.Roo})
.Union(from d in dc.mytable where foo == "bar" select new {First = d.Foo, Second = d.Roo})
So that the constant assignment within the generated T-SQL comes in the non-first table, then things may work because T-SQL ignores the column names of subsequent tables.
Note: The first table in a union decides both column name and type. So would be smart to get LinqPad anyway.