SQL Server vs MySQL: CONTAINS(*,'FORMSOF(THESAURUS,word)') - mysql

I am shocked.
I spent past 3-4 days figuring out how I could implement stemming (and synonyms searches) in mysql when I see in SQL Server the query is incredibly easly:
Select * from tab where CONTAINS(*,'FORMSOF(THESAURUS,word)')
Is possibile on MySql there isn't anything like that?

No, MySQL does not support matching against a user-provided thesaurus.
You can use an external FULLTEXT engine like Sphinx which supports morphology rules, has several stemmers and thesauri built in and allows pluggable ones.

Related

How to accomplish full text search in MySQL 5.5 Engine = innodb?

I've gone through many articles but didn't find how can I do it alternate to match and against. In one of my project I need to use this and it is not suitable to upgrade the version of the database.
Is there any way?

An alternative to MySQL fulltext search

I read that MySQL fulltext search can cause table locking. It means people can't insert or update the table when it's being searched on.
I read that there are many search servers (Lucence and Sphinx) can do it without table locking and even faster. It requires many configuration and hard to implement.
Is there any other way to use fulltext or some searching like that without using search service? I don't want to configure one more server other than MySQL.
Create an extra table which will be used only to perform FULLTEXT searches. In your code you have to ensure that all data and actions (create, update, delete) are properly replicated to this table. This solution is also handy if your data tables are running e.g. InnoDB engine.
Apache Lucene doesn't need many configuration and isn't hard to implement. Moreover, it's one of the most popular fulltext search engine, and allows the users to do very precise queries, like "to be or not to be", j?hn d?e, func*, etc.
I already did some database indexing with Lucene, so if you could be a bit more precise about which fields of which tables you wanna index, I can give you pieces of code which should do the trick.
I vote for Sphinxsearch anyway. It has one of APIs close to Mysql, easy to install and configure. Not so universal as Apache Lucene, but jet quick and very helpful in my projects.

Postgres 9.1 vs Mysql 5.6 InnoDB?

Simple question - what would better for a medium/big size database with requirement for compatibility with ACID in 2012.
I have read it all (well most) about mySQL vs pgSQL but most of those posts relate to version 4,5.1 and 7,8 respectively and are quite dated (2008,2009). Its almost 2012 now so I guess we could try and take a fresh look at the issue.
Basically I would like to know if there is anything in PostgreSQL that out-weights ease of use, availability and larger developer/knowledge base of MySQL.
Is MySQL's query optimizer still stupid? Is it still super slow on very complicated queries?
Hit me! :)
PS. And don't send me to goggle or wiki. I am looking for few specific points not an overview + I trust StackOverflow more than some random page with 'smart guy' shining his light.
Addendum
Size of the project: Say an ordering system with roughly 10-100 orders/day per account, couple of thousand accounts, eventually, each can have several hundred to several thousand users.
Better at: being future proof and flexible when it comes to growing and changing requirements. Performance is also important as to keep costs low in hardware department. Also availability of skilled workforce would be a factor.
OLTP or OLAP: OLTP
PostgreSQL is a lot more advanced when it comes to SQL features.
Things that MySQL still doesn't have (and PostgreSQL has):
deferrable constraints
check constraints (MySQL 8.0.16 added them, MariaDB 10.2 has them)
full outer join
MySQL silently uses an inner join with some syntax variations:
https://rextester.com/ADME43793
lateral joins
regular expressions don't work with UTF-8 (Fixed with MySQL 8.0)
regular expressions don't support replace or substring (Introduced with MySQL 8.0)
table functions ( select * from my_function() )
common table expressions (Introduced with MySQL 8.0)
recursive queries (Introduced with MySQL 8.0)
writeable CTEs
window functions (Introduced with MySQL 8.0)
function based index (supported since MySQL 8.0.15)
partial index
INCLUDE additional column in an indexes (e.g. for unique indexes)
multi column statistics
full text search on transactional tables (MySQL 5.6 supports this)
GIS features on transactional tables
EXCEPT or INTERSECT operator (MariaDB has them)
you cannot use a temporary table twice in the same select statement
you cannot use the table being changed (update/delete/insert) in a sub-select
you cannot create a view that uses a derived table (Possible since MySQL 8.0)
create view x as select * from (select * from y);
statement level read consistency. Needed for e.g.: update foo set x = y, y = x or update foo set a = b, a = a + 100
transactional DDL
DDL triggers
exclusion constraints
key/value store
Indexing complete JSON documents
SQL/JSON Path expressions (since Postgres 12)
range types
domains
arrays (including indexes on arrays)
roles (groups) to manage user privileges (MariaDB has them, Introduced with MySQL 8.0)
parallel queries (since Postgres 9.6)
parallel index creation (since Postgres 11)
user defined data types (including check constraints)
materialized views
custom aggregates
custom window functions
proper boolean data type
(treating any expression that can be converted to a non-zero number as "true" is not a proper boolean type)
When it comes to Spatial/GIS features Postgres with PostGIS is also much more capable. Here is a nice comparison.
Not sure what you call "ease of use" but there are several modern SQL features that I would not want to miss (CTEs, windowing functions) that would define "ease of use" for me.
Now, PostgreSQL is not perfect and probably the most obnoxious thing can be, to tune the dreaded VACUUM process for a heavy write database.
Is MySQL's query optimizer still stupid? Is it still super slow on
very complicated queries?
All query optimizers are stupid at times. PostgreSQL's is less stupid in most cases. Some of PostgreSQL's more recent SQL features (windowing functions, recursive WITH queries etc) are very powerful but if you have a dumb ORM they might not be usable.
Size of the project: Say an ordering system with roughly 10-100
orders/day per account, couple of thousand accounts, eventually, each
can have several hundred to several thousand users.
Doesn't sound that large - well within reach of a big box.
Better at: being future proof and flexible when it comes to growing
and changing requirements.
PostgreSQL has a strong developer team, with an extended community of contributors. Release policy is strict, with bugfixes-only in the point releases. Always track the latest release of 9.1.x for the bugfixes.
MySQL has had a somewhat more relaxed attitude to version numbers in the past. That may change with Oracle being in charge. I'm not familiar with the policies of the various forks.
Performance is also important as to keep costs low in hardware department.
I'd be surprised if hardware turned out to be a major component in a project this size.
Also availability of skilled workforce would be a factor.
That's your key decider. If you've got a team of experienced Perl + PostgreSQL hackers sat around idle, use that. If your people know Lisp and MySQL then use that.
OLTP or OLAP: OLTP
PostgreSQL has always been strong on OLTP.
My personal viewpoint is that the PostgreSQL mailing list are full of polite, helpful, knowledgeable people. You have direct contact with users with Terabyte databases and hackers who have built major parts of the code. The quality of the support is truly excellent.
As an addition to #a_horse_with_no_name answer, I want to name some features which I like so much in PostgreSQL:
arrays data type;
hstore extension - very useful for storing key->value data, possible to create index on columns of that type;
various language extensions - I find Python very useful when it comes to unstructured data handling;
distinct on syntax - I think this one should be ANSI SQL feature, it looks very natural to me (as opposed to MySQL group by syntax);
composite types;
record types;
inheritance;
Version 9.3 features:
lateral joins - one thing I miss from SQL Server (where it called outer/cross apply);
native JSON support;
DDL triggers;
recursive, materialized, updatable views;
PostgreSQL is a more mature database, it has a longer history, it is more ANSI SQL compliant, its query optimizer is significantly better. MySQL has different storage engines like MyISAM, InnoDB, in-memory, all of them are incompatible in a sense that an SQL query which runs on one engine may produce a syntax error when executed on another engine. Stored procedures are better in PostgreSQL.

Any third party search engines (fulltext search and so on) work fine with InnoDB tables?

I know, that InnoDB tables do not support fulltext searches, yet. So I thought of using a third party search engine like solr, xapian or whoosh. Do those third party tools work equivalently fine with InnoDB tables as they work with MyIsam tables? I need to find e.g. spelling suggestions, and similar strings...
You could use Solr/Lucene to do the fulltext-search over your DB data. Since my MySQL DB is to big for an fast fulltext-search, i decided to combine mysql and Solr/lucene.
It's important to know, that Solr/Lucene is not an MySQL Plugin. So you will not be able to search the fulltext-index by using typical MySQL SQL-Statements. An fulltext-search, initiated by the application, should be first send the request to the 3rd party fulltext-index (Solr), which returns the primary keys of the related documents. Second step is to run an SQL statement against your MySQL innoDB with an where clause with the corresponding primary keys from the Solr result set.
That solution works in my case very well and much, much faster (and better) than an typical MySQL Myisam fulltext-search.
As an alternative you could not only index the data in solr. You also could store the data in solr additionally. In that case, solr is able to return the full text. So you don't need get the data form the database, as in the example above.
Do those third party tools work equivalently fine with InnoDB tables as they work with MyIsam tables?
Absolutely. Solr has an DataImportHandler. Ther you define an SQL statement in order to get the data you like to index in solr, like: select * from MyTable;
But keep in mind: right now (as far as I know) ther is no MySQL solr plugin available. The cooperation of Solr and MySQL should be handled by the application.
Third-party fulltext search engines typically copy data returned by a MySQL query, and use it to populate their search index. There's no difference between MyISAM and InnoDB data sources in this respect.
I gave a presentation Practical Full-Text Search in MySQL a few years ago. You might find it interesting.
Sphinx supports its own index and just takes data from MySQL on a timely basis by issuing a query.
It is not even aware of the underlying table structure and as long as the query runs and returns the results, it's OK for Sphinx.
Other third party engines work in a similar way.

MySQL limitations

When using MySQL 5.1 Enterprise after years of using other database products like Sybase, Infomix, DB2; I run into things that MySQL just doesn't do. For example, it can only generate an EXPLAIN query plan for SELECT queries.
What other things I should watch out for?
You may take a look at long list here: MySQL Gotchas
Full outer joins. But you can still do a left outer join union right outer join.
One thing I ran into late in a project is that MySQL date types can't store milliseconds. Datetimes and timestamps only resolve to seconds! I can't remember the exact circumstances that this came up but I ended up needing to store an int that could be converted into a date (complete with milliseconds) in my code.
MySQL's JDBC drivers cache results by default, to the point that it will cause your program run out of memory (throw up OutOfMemory exceptions). You can turn it off, but you have to do it by passing some unusual parameters into the statement when you create it:
Statement sx = c.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,java.sql.ResultSet.CONCUR_READ_ONLY);
sx.setFetchSize(Integer.MIN_VALUE);
If you do that, the JDBC driver won't cache results in memory, and you can do huge queries BUT what if you're using an ORM system? You don't create the statements yourself, and therefore you can't turn of caching. Which basically means you're completely screwed if you're ORM system is trying to do something involving a lot of records.
If they had some sense, they would make that configurable by the JDBC URL. Oh well.
Allow for Roles or Groups
It doesn't cost a fortune. Try building a clustered website on multiple machines with multiple cores on an oracle database. ouch.
It still doesn't do CHECK constraints!