MySQL virtual column and wildcard - mysql

I was trying MySQL secondary indexing referring to MySQL Documentation, and weird thing happened.
Firstly, I created a table with small modification per the example in the document
create table jemp(
c JSON,
g VARCHAR(20) GENERATED ALWAYS AS (c->"$.name"),
INDEX i (g)
)
Secondly, I inserted values per the example in the document
INSERT INTO jemp (c) VALUES
('{"id": "1", "name": "Fred"}'), ('{"id": "2", "name": "Wilma"}'),
('{"id": "3", "name": "Barney"}'), ('{"id": "4", "name": "Betty"}');
And then, I tried to perform a fuzzy search with "like" and "wildcard". This doesn't work because index doesn't support prefix %, but it can get result.
select c->"$.name" as name from jemp where g like "%F%"
Here is the weird thing, I removed the prefix %, and index did work. However, I didn't get any results. Per my poor understanding of MySQL, this should work.
select c->"$.name" as name from jemp where g like "F%"
I would be so much appreciate if anyone could help me with it.

For your query to work, you want a generated column that extracts the name as text rather than JSON. That is, use ->> instead of ->:
g VARCHAR(20) GENERATED ALWAYS AS (c ->> '$.name')
Then: the index may help for both following conditions:
where g like 'F%'
where g = 'F'
Whether MySQL decides to use it or not is another story; basically the databases assesses whether using the index will be faster than a full scan. If it believes that the condition will match on a large number of rows, it will probably choose to full scan.
Note that I consistently use single quotes for string literals; although MySQL tolerates otherwise, this is what the SQL standard specifies. In some other databases, double quotes stand for identifiers (this also is compliant with the standard).

Related

Join returns NULL when data that matches is in the table

I'm trying to get results when both tables have the same machine number and there are entries that have the same number in both tables.
Here is what I've tried:
SELECT fehler.*,
'maschine.Maschinen-Typ',
maschine.Auftragsnummer,
maschine.Kunde,
maschine.Liefertermin_Soll
FROM fehler
JOIN maschine
ON ltrim(rtrim('maschine.Maschinen-Nr')) = ltrim(rtrim(fehler.Maschinen_Nr))
The field I'm joining on is a varchar in both cases. I tried without trims but still returns empty
I'm using MariaDB (if that's important).
ON ltrim(rtrim('maschine.Maschinen-Nr')) = ltrim(rtrim(fehler.Maschinen_Nr)) seems wrong...
Is fehler.Maschinen_Nr really the string 'maschine.Maschinen-Nr'?
SELECT fehler.*, `maschine.Maschinen-Typ`, maschine.Auftragsnummer, maschine.Kunde, maschine.Liefertermin_Soll
FROM fehler
JOIN maschine
ON ltrim(rtrim(`maschine.Maschinen-Nr`)) = ltrim(rtrim(`fehler.Maschinen_Nr`))
Last line compared a string to a number. This should be doing it.
Also, use the backtick to reference the column names.
The single quotes are string delimiters. You are comparing fehler.Maschinen_Nr with the string 'maschine.Maschinen-Nr'. In standard SQL you would use double quotes for names (and I think MariaDB allows this, too, certain settings provided). In MariaDB the commonly used name qualifier is the backtick:
SELECT fehler.*,
`maschine.Maschinen-Typ`,
maschine.Auftragsnummer,
maschine.Kunde,
maschine.Liefertermin_Soll
FROM fehler
JOIN maschine
ON trim(`maschine.Maschinen-Nr`) = trim(fehler.Maschinen_Nr)
(It would be better of course not to use names with a minus sign or other characters that force you to use name delimiters in the first place.)
As you see, you can use TRIM instead of LTRIM and RTRIM. It would be better, though, not to allow space at the beginning or end when inserting data. Then you wouldn't have to remove them in every query.
Moreover, it seems Maschinen_Nr should be primary key for the table maschine and naturally a foreign key then in table fehler. That would make sure fehler doesn't contain any Maschinen_Nr that not exists exactly so in maschine.
To avoid this problems in future, the convention for DB's is snake case(lowercase_lowercase).
Besides that, posting your DB schema would be really helpfull since i dont guess your data structures.
(For friendly development, is usefull that variables, tables and columns should be written in english)
So with this, what is the error that you get, because if table "maschine" has a column named "Maschinen-Nr" and table "fehler" has a column named "Maschinen_Nr" and the fields match each other, it should be correct
be careful with Maschinen-Nr and Maschinen_Nr. they have - and _ on purpose?
a very blind solution because you dont really tell what is your problem or even your schema is:
SELECT table1Alias.*, table2Alias.column_name, table2Alias.column_name
FROM table1 [table1Alias]
JOIN table2 [table2Alias]
ON ltrim(rtrim(table1Alias.matching_column)) = ltrim(rtrim(table2Alias.matching_column))
where matching_columns are respectively PK and FK or if the data matches both columns [] are optional and if not given, will be consider table_name

Query on a list of json in Mysql 5.6

I am using mysql 5.6 and it will not be feasible for me to upgrade it to 5.7. I have a table which stores json as an attribute. Attaching screenshot for reference.
Here, the column policy_status contains status and values of different policies as json for each user.
How can I find the list of users, say, with appVersion' status as success and value = 1437.
I got a few references online but as I am new to stored procedures I am not able to reach a solution. I will appreciate any help. Thanks in advance.
It is not efficient at all but may can help you with ideas:
SELECT *
FROM data
WHERE
(LOCATE('"employmentType":["status":"success"]', policy_status) > 0
AND
LOCATE('"value": 1', policy_status) > 0);
Using the LOCATE function you can see whether the field contains your desired appVersion and value strings. See sqlfiddle demo here.
Where the simple test data:
CREATE TABLE data (
id INT UNSIGNED NOT NULL,
policy_status TEXT
);
INSERT INTO data (id, policy_status) VALUES
(1,'{"employmentType":["status":"success"], "value": 1}'),
(2,'{"employmentType":["status":"no"], "value": 1}'),
(3,'{"employmentType":["status":"no"], "value": 0}'),
(4,'{"employmentType":["status":"success"], "value": 0}'),
(5,'{"employmentType":["status":"no"], "value": 1}');
gives the result:
{"employmentType":["status":"success"], "value": 1}
Where both strings are found.
UPDATE:
Also if you can add FULLTEXT index for your policy_status column than you can use fulltext search in the WHERE clause:
...
WHERE
MATCH (policy_status) AGAINST ('+"employmentType:[status:success]" +"value: 1"' IN BOOLEAN MODE)
Note the + and " characters in AGAINST(...). They are special boolean full-text search operators. See here.
A leading or trailing plus sign indicates that this word must be
present in each row that is returned
and
A phrase that is enclosed within double quote (") characters matches
only rows that contain the phrase literally, as it was typed. The
full-text engine splits the phrase into words and performs a search in
the FULLTEXT index for the words. Nonword characters need not be
matched exactly.
If it is not an option in your case, you can use LIKE for matching the substrings:
...
WHERE
(policy_status LIKE '%"employmentType":["status":"success"]%'
AND
policy_status LIKE '%"value": 1%');
See sqlfiddle demo for both.

mySQL table with {"Twitter": 28, "Total": 28, "Facebook": 1}

There is a table with one column, named "info", with content like {"Twitter": 28, "Total": 28, "Facebook": 1}. When I write sql, I want to test whether "Total" is larger than 10 or not. Could someone help me write the query? (table name is landslides_7d)
(this is what I have)
SELECT * FROM landslides_7d WHERE info.Total > 10;
Thanks.
The data format seems to be JSON. If you have MySQL 5.7 you can use JSON_EXTRACT or the short form ->. Those functions don't exist in older versions.
SELECT * FROM landslides_7d WHERE JSON_EXTRACT(info, '$.total') > 10;
or
SELECT * FROM landslides_7d WHERE info->total > 10;
See http://dev.mysql.com/doc/refman/5.7/en/json-search-functions.html#function_json-extract
Mind that this is a full table scan. On a "larger" table you want to create an index.
If you're on an older version of MySQL you should create an extra column to your table and manually add the total value to that column.
You probably are storing the JSON in a single blob or string column. This is very inefficient, since you can't make use of indexes, and will need to parse the entire JSON structure on every where query. I'm not sure how much flexibility you need, but if the JSON attributes are relatively fixed, I recommend running a script (ruby, Python, etc.) on the table contents and storing "total" in a traditional columnar format. For example, you could add a new column "total" which contains the total attribute as an INT.
A side benefit of using a script is that you can catch any improperly formatted JSON - something you can't do in a single query.
You can also keep "total" column maintained with a trigger (on update/insert of "info"), using the JSON_EXTRACT function referenced in #johannes answer.

Why will delete query not recognise the value in column header process_status in table?

Why will the query not recognize the value of column header process_status in table?
I am using below query to try and delete rows where the process_status is "L" only however when I run the query the database asks me to enter a value for “L” as opposed to looking for that value in the column – Why is this?
DELETE SELECT UFA_Linked.*, UFA_Linked.ACCPED_ACCOUNT_NO
FROM UFA_Linked
WHERE (((UFA_Linked.ACCPED_ACCOUNT_NO) In (
SELECT [Account_No]
FROM [deals_extract]
WHERE [deal_type_description]<> "Term Extension" AND [deal_length_years]>5 AND [process_status] = “L”)));
i have tried re-arranging the query as the same principle is working for "Term Extension".
Also tried not in ("all", "values", "other", "than" L) but the query does then not recognise the subsequent values.
Not having much luck searching for existing answers - probably phrasing my questions poorly.
You use “L” quotes instead of "L". “” - are symbols like others, not special ones, required for strings and Access tries to find column [“L”], cannot find it and considers it as parameter and asks you to enter the value for it. Replace quotes.

MySQL - FULLTEXT in BOOLEAN mode + Relevance using views field

I have the following table:
CREATE TABLE IF NOT EXISTS `search`
(
`id` BIGINT(16) NOT NULL AUTO_INCREMENT PRIMARY KEY,
`string` TEXT NOT NULL,
`views` BIGINT(16) NOT NULL,
FULLTEXT(string)
) ENGINE=MyISAM;
It has a total of 5,395,939 entries. To perform a search on a term (like 'a'), I use the query:
SELECT * FROM `search` WHERE MATCH(string) AGAINST('+a*' IN BOOLEAN MODE) ORDER BY `views` DESC LIMIT 10
But it's really slow =(. The query above took 15.4423 seconds to perform. Obviously, it's fast without sorting by views, which takes less than 0.002s.
I'm using ft_min_word_len=1 and ft_stopword_file=
Is there any way to use the views as the relevance in the fulltext search, without making it too slow? I want the search term "a b" match "big apple", for example, but not "ibg apple" (just need the search prefixes to match).
Thanks
Since no one answered my question, I'm posting my solution (not the one I would expect to see if I was googling, since it isn't so easy to apply as a simple database-design would be, but it's still a solution to this problem).
I couldn't really solve it with any engine or function used by MySQL. Sorry =/.
So, I decided to develop my own software to do it (in C++, but you can apply it in any other language).
If what you are looking for is a method to search for some prefixes of words in small strings (the average length of my strings is 15), so you can use the following algorithm:
1. Create a trie. Each word of each string is put on the trie.
Each leaf has a list of the ids that match that word.
2. Use a map/dictionary (or an array) to memorize the informations
for each id (map[id] = information).
Searching for a string:
Note: The string will be in the format "word1 word2 word3...". If it has some symbols, like #, #, $, you might consider them as " " (spaces).
Example: "Rafael Perrella"
1. Search for the prefix "Rafael" in the trie. Put all the ids you
get in a set (a Binary-Search Tree that ignores repeated values).
Let's call this set "mainSet".
2. Search for the prefix "Perrella" in the trie. For each result,
put them in a second set (secSet) if and only if they are already
in the mainSet. Then, clear mainSet and do mainSet = secSet.
3. IF there are still words lefting to search, repeat the second step
for all those words.
After these steps, you will have a set with all the results. Make a vector using a pair for the (views, id) and sort the vector in descending order. So, just get the results you want... I've limited to 30 results.
Note: you can sort the words first to remove those with the same prefix (for example, in "Jan Ja Jan Ra" you only need "Jan Ra"). I will not explain about it since the algorithm is pretty obvious.
This algorithm may be bad sometimes (for example, if I search for "a b c d e f ... z", I will search the entire trie...). So, I made an improvement.
1. For each "id" in your map, create also a small trie, that will
contain the words of the string (include a trie for each m[id]...
m[id].trie?).
Then, to make a search:
1. Choose the longest word in the search string (it's not guaranteed,
but it is probably the word with the fewest results in the trie...).
2. Apply the step 1 of the old algorithm.
3. Make a vector with the ids in the mainSet.
4. Let's make the final vector. For each id in the vector you've created
in step 3, search in the trie of this id (m[id].trie?) for all words
in the search string. If it includes all words, it's a valid id and
you might include it in the final vector; else, just ignore this id.
5. Repeat step 4 until there are no more ids to verify. After that, just
sort the final vector for <views, id>.
Now, I use the database just as a way to easily store and load my data. All the queries in this table are directly asked to this software. When I add or remove a record, I send both to the DB and to the software, so I always keep both updated. It costs me about 30s to load all the data, but then the queries are fast (0.03s for the slowest ones, 0.001s in average; using my own notebook, didn't try it in a dedicated hosting, where it might be much faster).