This question already has answers here:
Is storing a delimited list in a database column really that bad?
(10 answers)
Closed 2 years ago.
I'm really struggling to figure this query out:
names
zone
zones
zone_active
bob
1
1,2
yes
bill
0
3
yes
james
1
1,2
yes
fred
1
1,2
no
barry
1
4
yes
Im selecting zones '1,2' and zone_active='yes'
But it's returning all rows except Bill and Barry its seems to be ignoring the zone_active part
SELECT p.names, n.zone, n.zones, n.zone_active
FROM names as n
JOIN people as p ON p.names=n.names
WHERE zone IN ('1,2') AND zone_active='yes'
It should only return - bob, james
any ideas?
IN() does not treat a string as a list of discrete values just because the string contains commas. The value '1,2' is a single value, a string. So the operand will be compared to one value.
I assume zone is an integer (though you have not shown your table's data types, so that might be a wrong assumption). If you compare an integer to a string that has leading digits like '1,2', the initial digits are converted to an integer, and then compared.
So if zone is an integer column, then this:
WHERE zone IN ('1,2')
Is equivalent to this:
WHERE zone = 1
This does not match the row for bill because zone is 0 and the integer conversion for '1,2' is 1. So 0 = '1,2' is false.
The query should match the row for barry because 1 = '1,2' is true. I wonder if you made some mistake in your question.
Re your comment:
I want it to check if 1 or 2 is in the zone field
That would be:
WHERE zone IN (1,2) AND zone_active='yes'
But not:
WHERE zone IN ('1,2') AND zone_active='yes'
If you're trying to use query parameters, you need a separate parameter for each value:
WHERE zone IN (?, ?) AND zone_active='yes'
Because each parameter will be interpreted as one value. You can't use a single parameter for a list of values. If you try to use a parameter value of '1,2' that will become one string value, and that'll behave as I described above.
Some people try to fake this using FIND_IN_SET() but I would not recommend that. It will ruin any chance of optimizing the query with indexes.
Related
I have a string S = "1-2-3-4-5-6-7-8"
This is how my database table rows look like:
id
SubSequence
1
1-2-4-5
2
1-3-4-5
3
2-5-7-8
4
5-8-9-10
5
6-7-10-11
and so on ...
I want to write a query that would update (in this example) only the first 3 rows because they're a subsequence of string S.
The current solution I have is to programmatically go thru each row, check if it's a subsequence, and update. But I'm wondering if there's a way to do it at the MySQL level for performance.
Update: I don't mind changing the way data is stored. For example, String S could be an array holding those numbers, and the "SubSequence" column can hold those numbers as an array.
No, there is not a way to do the query you describe with good performance in SQL when you store the subsequences as strings like you have done. The reason is that doing substring comparisons cannot be optimized with indexes, so your query will be forced to do the comparisons row by row.
In general, when you try to store sets of values as a string, but you want to use SQL to treat them as discrete values, it's bound to be awkward, difficult to code, and ultimately have bad performance.
In this case, what I would do is make a two tables, one that numbers your entities, and a second table in which each value in your subsequence is stored on a row by itself.
SubSequences:
id
1
2
SubSequenceElements:
id
SubSequenceElement
1
1
1
2
1
4
1
5
2
1
2
3
2
4
2
5
And so on.
Then you can use relational-division techniques to find cases where every element of this set exists in the set you want to compare it to.
Here's an example:
SELECT s.id
FROM SubSequences AS s
LEFT OUTER JOIN (
SELECT id
FROM SubSequenceElements
WHERE SubSequenceElement NOT IN (1,2,3,4,5,6,7,8)
) AS invalid USING (id)
WHERE invalid.id IS NULL;
In other words, you want to return rows from SubSequences such that no match is found in SubSequenceElements with an element value that is not in the set you're trying to match.
It's a bit confusing, because you have to think about the problem is a double-don't-match-this-set problem. But once you get relational division, it can be very powerful.
If the set can be represented by the numbers 0 through 63 (or some subset of that), then...
Using a column like this
elements BIGINT UNSIGNED NOT NULL DEFAULT '0'
Then "2-5-7-8" could be put into it thus:
UPDATE ...
SET elements = (1<<2) | (1<<5) | (1<<7) | (1<<8);
Then various operations can be done in a single expression:
WHERE elements = (1<<2) | (1<<5) | (1<<7) | (1<<8) -- Test for exactly that set
WHERE (elements ^ ~ ( (1<<2) | (1<<5) | (1<<7) | (1<<8) )) != 0
-- checks to see if any other bits are turned on
This last example is close to what you need. One side of the "and not" would have the 1..8 of your example, the other would have
Your example has S represented as 0x1FE;
WHERE subsequence & ~0x1FE
will be 0 (false) for ids 1,2,3; non-zero (true) for ids 4 and 5.
So I'm currently using MySQL's JSON field to store some data.
So the 'reports' table looks like this:
id | stock_id | type | doc |
1 | 5 | Income_Statement | https://pastebin.com/bj1hdK0S|
The pastebin is the content of the json field
What I want to do is get a number (ebit) from the first object under yearly (2018-12-31) in the JSON and then use that to do a WHERE query on so that it only returns where ebit > 50000000 for example. The issue is that the dates under yearly are not standard (i.e. one might be 2018-12-31, the other might by 2018-12-15). So essentially I want a way to get the data using integer indexes rather than the actual names of the objects, so something like yearly.[0].ebit.
How would I do this in MySQL? Alternatively if it's not possible in MySQL, would it be possible in either PostgeSQL or Mongo? If so, could you give me an example? Most of the data fits well into MySQL only this table has a JSON column which is why I started with MySQL.
so StackOverflow isn't letting my link to pastebin without some code so here's some random code:
if(dog == "poodle") {
print "test"
}
I don't know for MySQL nor MongoDB, but here's a simple version for PostgreSQL JSONB type:
SELECT (doc->'yearly'-> max(years) -> 'ebit')::numeric AS ebit
FROM reports, jsonb_object_keys(doc->'yearly') AS years
GROUP BY reports.doc;
...with simplistic test data:
WITH reports(doc) AS (
SELECT '{"yearly":{"2018-12-31":{"ebit":123},"2017-12-31":{"ebit":1.23}}}'::jsonb
)
SELECT (doc->'yearly'-> max(years) -> 'ebit')::numeric AS ebit
FROM reports, jsonb_object_keys(doc->'yearly') AS years
GROUP BY reports.doc;
...gives:
ebit
------
123
(1 row)
So I've basically selected the latest entry under "yearly" without knowing actual values but assuming that the key date formatting will allow a sort order (in this case it seems to comply with ISO-8601).
Using data type JSON instead of JSONB would preserve object key order but is not as efficient in PostgreSQL further down the road and wouldn't help here either.
IF you want to then select only those reports entries having their latest ebit greater than a certain value, just pack it into a sub-select or a CTE. I usualy prefer CTE's because they are better to read, so here we go:
WITH
reports (id, doc) AS (
VALUES
(1, '{"yearly":{"2018-12-31":{"ebit":123},"2017-12-31":{"ebit":1.23}}}'::jsonb),
(2, '{"yearly":{"2018-12-23":{"ebit":50},"2017-12-22":{"ebit":"1200.00"}}}'::jsonb)
),
r_ebit (id, ebit) AS (
SELECT reports.id, (reports.doc->'yearly'-> max(years) -> 'ebit')::numeric AS ebit
FROM reports, jsonb_object_keys(doc->'yearly') AS years
GROUP BY reports.id, reports.doc
)
SELECT id, ebit
FROM r_ebit
WHERE ebit > 100;
However, as you already see, it is not possible to filter the original rows using this strategy. A pre-processing step would make sense here so that the JSON format actually is filter-friendly.
ADDENDUM
To add the possibility of selecting the values for the n-th completed fiscal year, we need resort to window functions and we also need to reduce the resulting set to only return a single row per actual group (in the demonstration case: reports.id):
WITH reports(id, doc) AS (VALUES
(1, '{"yearly":{"2018-12-31":{"ebit":123},"2017-12-31":{"ebit":1.23},"2016-12-31":{"ebit":"23.42"}}}'::jsonb),
(2, '{"yearly":{"2018-12-23":{"ebit":50},"2017-12-22":{"ebit":"1200.00"}}}'::jsonb)
)
SELECT DISTINCT ON (1) reports.id, (reports.doc->'yearly'-> (lead(years, 0) over (partition by reports.doc order by years desc nulls last)) ->>'ebit')::numeric AS ebit
FROM reports, jsonb_object_keys(doc->'yearly') AS years
GROUP BY 1, reports.doc, years.years ORDER BY 1;
...will behave exactly as using the max aggregate function previously. Increasing the offset parameter within the lead(years, <offset>) function all will select the n-th year backwards (because of descending order of the window partition).
The DISTINCT ON (1) clause is the magic that reduces the result to a single row per distinct column value (first column = reports.id). This is why the NULLS LAST is very important inside the window OVER clause.
Here are results for different offsets (I've added a third historic entry for the first id but not for the second to also show how it deals with absent entries):
N = 0:
id | ebit
----+------
1 | 123
2 | 50
N = 1
id | ebit
----+---------
1 | 1.23
2 | 1200.00
N = 2
id | ebit
----+-------
1 | 23.42
2 |
...which means absent entries will just result in a NULL value.
I have a simple table from two fields, words, frequency...
- Words Frequncy
- ABC 5
- DEF 7
- GHI 9
- ABC 3
- DEF 2
- GHI 1
The words are repeated with different frequencies and I want to sum the Frequncy values for each word to
- ABC 8
- DEF 9
- GHI 10
in a query.
What you need is a GROUP BY clause
SELECT Words, SUM(frequency) AS TotalFrequency
FROM the_table
GROUP BY Words
ORDER BY Words;
The GROUP BY clause must list all the columns used in the select list to which no aggregate function (like MIN, MAX, AVG or SUM) is applied.
The column name generated for expressions is not defined. In Access, for instance, it depends on the language version of Access. The name might be SumOfFrequency in English but SummeVonFrequency in German. An application working on one PC might fail on another one. Therefore I suggest defining a column name explicitly with expr AS column_name.
You need a group by clause. What this clause does is to, for lack of a better word, group the result for each distinct value in the column(s) specified in it. Then, aggregate functions (like sum) can be applied separately for each group. So, in your use case, you'd want to group the rows per value in Words and then sum each group's Frequency:
SELECT Words, SUM(Frequency)
FROM MyTable
GROUP BY Words
I have some strings in my database. Some of them have numeric values (but in string format of course). I am displaying those values ordered ascending.
So we know, for string values, 10 is greater than 2 for example, which is normal. I am asking if there is any solution to display 10 after 2, without changing the code or the database structure, only the data.
If for example I have to display values from 1 to 10, I will have:
1
10
2
3
4
5
6
7
8
9
What I would like to have is
1
2
3
4
5
6
7
8
9
10
Is there a possibility to ad an "invisible character or string which will be interpreted as greater than 9". If i put a10 instead of 10, the a10 will be at the end but is there any invisible or less visible character for that.
So, I repeat, I am not looking for a programming or database structure solution, but for a simple workaround.
You could try to cast the value as an number to then order by it:
select col
from yourtable
order by cast(col AS UNSIGNED)
See SQL Fiddle with demo
You could try appending the correct number of zeroes to the front of the data:
01
02
03
..
10
11
..
99
Since you have a mixture of numbers and letters in this column - even if not in a single row - what you're really trying to do is a Natural Sort. This is not something MySQL can do natively. There are some work arounds, however. The best I've come across are:
Sort by length then value.
SELECT
mixedColumn
FROM
tableName
ORDER BY
LENGTH(mixedColumn), mixedColumn;
For more examples see: http://www.copterlabs.com/blog/natural-sorting-in-mysql/
Use a secondary column to use as a sort key that would contain some sort of normalized data (i.e. only numbers or only letters).
CREATE TABLE tableName (mixedColumn varchar, sortColumn int);
INSERT INTO tableName VALUES ('1',1), ('2',2), ('10',3),
('a',4),('a1',5),('a2',6),('b1',7);
SELECT
mixedColumn
FROM
tableName
ORDER BY
sortColumn;
This could get difficult to maintain unless you can figure out a good way to handle the ordering.
Of course if you were able to go outside of the database you'd be able to use natural sort functions from various programming languages.
i have a mySQL database with a text field in which is stored a number.
i need to produce a recordset sorted in descending numerical order.
this works fine until we get to numbers greater than 10 ie
9
8
7
6
5
4
3
2
10
1
is there a simple way of sorting this 'correctly' ?
(yes, i know i should have numbers in a numerical field, but i'm working with what i have :))
i'm using the results on an asp/vbscript/jquery page so maybe even a client-side solution is viable...
any suggestions?
ORDER BY ABS(text_column) DESC
Or, if you also have to deal with negative values:
ORDER BY CAST(text_column AS SIGNED) DESC
You need to type cast it to INTEGER using CAST function in MySQL:
ORDER BY CAST(text_column AS UNSIGNED INTEGER)
Try this one -
... ORDER BY text_column * 1