Database choices for big data [closed] - mysql

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I have many text files, their total size is about 300GB ~ 400GB. They are all in this format
key1 value_a
key1 value_b
key1 value_c
key2 value_d
key3 value_e
....
each line is composed by a key and a value. I want to create a database which can let me query all value of a key. For example, when I query key1, value_a, value_b and value_c are returned.
First of all, inserting all these files into the database is a big problem. I try to insert a few GBs size chunk to MySQL MyISAM table with LOAD DATA INFILE syntax. But it appears MySQL can't utilize multicores for inserting data. It's as slow as hell. So, I think MySQL is not a good choice here for so many records.
Also, I need to update or recreate the database periodically, weekly, or even daily if possible, therefore, insertion speed is important for me.
It's not possible for a single node to do the computing and insertion efficiently, to be efficient, I think it's better to perform the insertion in different nodes parallely.
For example,
node1 -> compute and store 0-99999.txt
node2 -> compute and store 10000-199999.txt
node3 -> compute and store 20000-299999.txt
....
So, here comes the first criteria.
Criteria 1. Fast insertion speed in distributed batch manner.
Then, as you can see in the text file example, it's better to provide multiple same key to different values. Just like key1 maps to value_a/value_b/value_c in the example.
Criteria 2. Multiple keys are allowed
Then, I will need to query keys in the database. No relational or complex join query is required, all I need is simple key/value querying. The important part is that multiple key to same value
Criteria 3. Simple and fast key value querying.
I know there are HBase/Cassandra/MongoDB/Redis.... and so on, but I'm not familiar with all of them, not sure which one fits my needs. So, the question is - what database to use? If none of them fits my needs, I even plan to build my own, but it takes efforts :/
Thanks.

There are probably a lot of systems that would fit your needs. Your requirements make things pleasantly easy in a couple ways:
Because you don't need any cross-key operations, you could use multiple databases, dividing keys between them via hash or range sharding. This is an easy way to solve the lack of parallelism that you observed with MySQL and probably would observe with a lot of other database systems.
Because you never do any online updates, you can just build an immutable database in bulk and then query it for the rest of the day/week. I'd expect you'd get a lot better performance this way.
I'd be inclined to build a set of hash-sharded LevelDB tables. That is, I wouldn't use an actual leveldb::DB which supports a more complex data structure (a stack of tables and a log) so that you can do online updates; instead, I'd directly use leveldb::Table and leveldb::TableBuilder objects (no log, only one table for a given key). This is a very efficient format for querying. And if your input files are already sorted like in your example, the table building will be extremely efficient as well. You can achieve whatever parallelism you desire by increasing the number of shards - if you're using a 16-core, 16-disk machine to build the database, then use at least 16 shards, all generated in parallel. If you're using 16 16-core, 16-disk machines, at least 256 shards. If you have a lot fewer disks than cores as many people do these days, try both, but you may find fewer shards are better to avoid seeks. If you're careful, I think you can basically max out the disk throughput while building tables, and that's saying a lot as I'd expect the tables to be noticeably smaller than your input files due to the key prefix compression (and optionally Snappy block compression). You'll mostly avoid seeks because aside from a relatively small index that you can typically buffer in RAM, the keys in the leveldb tables are stored in the same order as you read them from the input files, assuming again that your input files are already sorted. If they're not, you may want enough shards that you can sort a shard in RAM then write it out, perhaps processing shards more sequentially.

I would suggest you using SSDB(https://github.com/ideawu/ssdb), a leveldb server that suitable for storing collections of data.
You can store the data in maps:
ssdb->hset(key1, value1)
ssdb->hset(key1, value2)
...
list = ssdb->hscan(key1, 1000);
// now list = [value1, value2, ...]
SSDB is fast(half the speed of Redis, 30000 insertions per second), it is a network wrapper of leveldb, one-line installation and startup. Its clients include PHP, C++, Python, Java, Lua, ...

The traditional answer would be to use Oracle if you have the big bucks, or PostgreSQL if you don't. However, I'd suggest you also look at solutions like mongoDb which I found to be blazing fast and will also accomodate a scenario where your schema is not fixed and can change across your data.

Since you are already familiar with MySQL, I suggest trying all MySQL options before moving to a new system.
Many bigdata systems are tuned for very specific problems but don't fare well in areas that are taken for granted from a RDBMS. Also, most applications need regular RDBMS features alongside bigdata features. So moving to a new system may create new problems.
Also consider the software ecosystem, community support and knowledge base available around the system of your choice.
Coming back to the solution, how many rows would be there in the database? This is an important metric. I am assuming more than 100 million.
Try Partitioning. It can help a lot. The fact that your select criteria is simple and you don't require joins only make things better.
Postgres has a nice way of handling partitions. It requires more code to get up and running but gives an amazing control. Unlike MySQL, Postgres does not have a hard limit on number of partitions. Partitions in Postgres are regular tables. This gives you much more control over indexing, searching, backup, restore, parallel data access etc.

Take a look at HBase. You can store multiple values against a key, by using columns. Unlike RDBMS, you don't need to have fixed set of columns in each row, but can have arbitrary number of columns for a row. Since you query data by a key (row-key in HBase parlance), you can retrieve all the values for a given key by reading values of all the columns in that row.
HBase also concept of retention period, so you can decide which columns live for how long. Hence, the data can get cleaned up on its own, as per need basis. There are some interesting techniques people have employed to utilize the retention periods.
HBase is quite scalable, and supports very fast reads and writes.

InfoBright maybe is a good choice.

Related

Big quantity of data with MySql

i have a question about Mass storage. Actually, i'm working with 5 sensors which sends a lot of datas with a different frequency for each one and i'm using MySQL DATABASE.
so here is my questions:
1) is MySQL the perfect solution.
2) if not, is there a solution to store this big quantity of data in a data base?
3) I'm using Threads in this and i'm using mutexs also, i'm afraid if this can cause problems, Actually,it seems to be.
i hope i will have an answer to this question.
MySql is good solution for OLTP scenarios where you are storing transactions to serve web or mobile apps. But it does not scale well (despite of cluster abilities).
There are many options out there based on what is important to you:
File System: You can device your own write-ahead-log solution to solve multi-threading problems and achieve "eventual consistency". That way you don't have to lock data for one thread at a time. You can use schema-full files like CSV, Avro or Parquet. Also you can use S3 or WSB for cloud based block storage. Or HDFS for just block and replicated storage.
NoSql: You can store each entry as document in NoSql Document stores. If you want to keep data in memory for faster read, explore Memcached or Redis. If you want to perform searches on data, use Solr or ElasticSearch. MongoDB is popular but it has scalability issues similar to MySql, instead I would chose Cassandra or HBase if you need more scalability. With some of NoSql stores, you might have to parse your "documents" at read time which may impact analytics performance.
RDBMS: As MySql is not scalable enough, you might explore Teradata and Oracle. Latest version of Oracle offers petabyte query capabilities and in-memory caching.
Using a database can add extra computation overhead if you have a "lot of data". Another question is what you do with the data? If you only stack them, a map/vector can be enough.
The first step is maybe to use map/vector that you can serialize to a file when needed. Second you can add the database if you wish.
About mutex if you share some code with different thread and if (in this code) you work on the same data at the same time, then you need them. Otherwise remove them. BTW if you can separate read and write operations then you don't need mutex/semaphore mechanism.
You can store data anywhere, but the data storage structure selection would depends on the use cases (the things, you want to do with the data).
It could be HDFS files, RDBMS, NoSQL DB, etc.
For example your common could be:
1. to save the sensor data very quickly.
2. get the sensor data on the definite date.
Then, you can use MongoDB or Cassandra.
If you want to get deep analytics (to get monthly average sensor data), you definitely should think about another solutions.
As for MySQL, it could also be used for some reasonable big data storage,
as it supports sharding. It fits some scenarios well, some not.
But I repeat, all would depend on use cases, i.e. the things you want to do with data.
So you could provide question with more details (define desired use-cases), or ask again.
There are several Questions that discuss "lots of data" and [mysql]. They generally say "yes, but it depends on what you will do with it".
Some general statements (YMMV):
a million rows -- no problem.
a billion rows or a terabyte of data -- You will run into problems, but they are no insurmountable.
100 inserts per second on spinning disk -- probably no problem
1000 rows/second inserted can be done; troubles are surmountable
creating "reports" from huge tables is problematical until you employ Summary Tables.
Two threads storing into the same table at the "same" time? Every RDBMS (MySQL included) solves that problem before the first release. The Mutexes (or whatever) are built into the code; you don't have to worry.
"Real time" -- If you are inserting 100 sensor values per second and comparing each value to one other value: No problem. Comparing to a million other values: big problem with any system.
"5 sensors" -- Read each hour? Yawn. Each minute? Yawn. Each second? Probably still Yawn. We need more concrete numbers to help you!

Distributed database use cases

At the moment i do have a mysql database, and the data iam collecting is 5 Terrabyte a year. I will save my data all the time, i dont think i want to delete something very early.
I ask myself if i should use a distributed database because my data will grow every year. And after 5 years i will have 25 Terrabyte without index. (just calculated the raw data i save every day)
i have 5 tables and the most queries are joins over multiple tables.
And i need to access mostly 1-2 columns over many rows at a specific timestamp.
Would a distributed database be a prefered database than only a single mysql database?
Paritioning will be difficult, because all my tables are really high connected.
I know it depends on the queries and on the database table design and i can also have a distributed mysql database.
i just want to know when i should think about a distributed database.
Would this be a use case? or could mysql handle this large dataset?
EDIT:
in average i will have 1500 clients writing data per second, they affect all tables.
i just need the old dataset for analytics. Like machine learning and
pattern matching.
also a client should be able to see the historical data
Your question is about "distributed", but I see more serious questions that need answering first.
"Highly indexed 5TB" will slow to a crawl. An index is a BTree. To add a new row to an index means locating the block in that tree where the item belongs, then read-modify-write that block. But...
If the index is AUTO_INCREMENT or TIMESTAMP (or similar things), then the blocks being modified are 'always' at the 'end' of the BTree. So virtually all of the reads and writes are cacheable. That is, updating such an index is very low overhead.
If the index is 'random', such as UUID, GUID, md5, etc, then the block to update is rarely found in cache. That is, updating this one index for this one row is likely to cost a pair of IOPs. Even with SSDs, you are likely to not keep up. (Assuming you don't have several TB of RAM.)
If the index is somewhere between sequential and random (say, some kind of "name"), then there might be thousands of "hot spots" in the BTree, and these might be cacheable.
Bottom line: If you cannot avoid random indexes, your project is doomed.
Next issue... The queries. If you need to scan 5TB for a SELECT, that will take time. If this is a Data Warehouse type of application and you need to, say, summarize last month's data, then building and maintaining Summary Tables will be very important. Furthermore, this can obviate the need for some of the indexes on the 'Fact' table, thereby possibly eliminating my concern about indexes.
"See the historical data" -- See individual rows? Or just see summary info? (Again, if it is like DW, one rarely needs to see old datapoints.) If summarization will suffice, then most of the 25TB can be avoided.
Do you have a machine with 25TB online? If not, that may force you to have multiple machines. But then you will have the complexity of running queries across them.
5TB is estimated from INT = 4 bytes, etc? If using InnoDB, you need to multiple by 2 to 3 to get the actual footprint. Furthermore, if you need to modify a table in the future, such action probably needs to copy the table over, so that doubles the disk space needed. Your 25TB becomes more like 100TB of storage.
PARTITIONing has very few valid use cases, so I don't want to discuss that until knowing more.
"Sharding" (splitting across machines) is possibly what you mean by "distributed". With multiple tables, you need to think hard about how to split up the data so that JOINs will continue to work.
The 5TB is huge -- Do everything you can to shrink it -- Use smaller datatypes, normalize, etc. But don't "over-normalize", you could end up with terrible performance. (We need to see the queries!)
There are many directions to take a multi-TB db. We really need more info about your tables and queries before we can be more specific.
It's really impossible to provide a specific answer to such a wide question.
In general, I recommend only worrying about performance once you can prove that you have a problem; if you're worried, it's much better to set up a test rig, populate it with representative data, and see what happens.
"Can MySQL handle 5 - 25 TB of data?" Yes. No. Depends. If - as you say - you have no indexes, your queries may slow down a long time before you get to 5TB. If it's 5TB / year of highly indexable data it might be fine.
The most common solution to this question is to keep a "transactional" database for all the "regular" work, and a datawarehouse for reporting, using a regular Extract/Transform/Load job to move the data across, and archive it. The data warehouse typically has a schema optimized for querying, usually entirely unlike the original schema.
If you want to keep everything logically consistent, you might use sharding and clustering - a sort-a-kind-a out of the box feature of MySQL.
I would not, however, roll my own "distributed database" solution. It's much harder than you might think.

How to handle a table with billion of rows with lots of read and write operations

Please guide me through my problem
I receive data at every 1 sec at my server from different sources.My data is structured i parse it and now i have to store this parsed data into single table around 5 lacs of records in a day. Also daily i do lots of read operation on this table.After some time this table will have billions of record.
How should i solve this problem? I want to know should i go with RDBMS or HBase or any other option.
My question is regarding what sort of database repository you wish to use: RAM? Flash? Disk?
RAM responds in nanoseconds.
Flash in microseconds.
Disk in milliseconds.
And, of course, you might want to create a hybrid of all three, especially if some keys were "hotter" than others -- more likely to be read over and over.
If you want to do a lot of fast processing, and scale it "wide" (many CPUs in a cluster for faster read performance), you are a likely candidate for a NoSQL database. I'd need to know more about your data model to know whether it would work as a key-value store, and how it might require more internal structure such as JSON/BSON.
Caveat: I am biased towards Aerospike, my employer. Yet you should do some kicking-of-the-tires with us or any other key-value stores you're considering to see if it would work with your data before betting the farm. Obviously, each NoSQL vendor would claim itself to be "the best," but much depends on your use case. A vendor's "solution" will only work well for certain data models. We tend to be best for fast in-memory RAM/Flash or hybrid implementations.
If in case your table would reach billions of records, RDBMS definitely won't scale.
Regarding HBASE, it depends on your requirements whether it would be a good solution or not.
If you are looking for real time reads, Hbase would only help if you are only looking for a specific key. If you want to do random reads on different columns, Hbase won't be an ideal solution here. Hbase would scale really well in case of updates.
I would suggest you to design your Hbase schema efficiently and store your data in way which suits your querying.
However if you are interested in running aggregation queries you can also map your hbase table to an external table in Hive and run sql type queries on your data.
You can use HBase as a NoSQL database in this case. To make search more customized and faster use ElasticSearch along with Hbase.
If you writes are at 1/second, most of the available databases should be able to support this. Since you are looking for longer term/persistent store, you should consider a database that provides you horizontal scale so that you could add more nodes as and when you would like to increase the capacity. Databases with auto-sharding abilities would be great fit for you (cassandra, aerospike ...). Make sure you choose a auto-sharding database that doesn't require client/application to manage which data is stored where. In-memory databases would not fit the bill in this case.
When your storage is a few tera-bytes, you may have to worry about the database scale, throughput so that your infra cost doesn't bogg you down.
Your query patterns would be very crucial in choosing the right solution. You may not want to index everything, but fine-tune what you index so that you could query on the keys and/or only those data elements from within your records so that index storage overhead doesn't become too much, and hence you keep the cost under control. You should also look for time-range query ability for the database solutions, which seems to be part of your typical query pattern.
Last but not the least, you would want to have your queries processes in fastest possible time. You should try out Cassandra (good for horizontal scaling, less on the throughput) and aerospike (good for horizontal scaling, pretty good on throughput).

Redis vs MySQL for Financial Data?

I realize that this question is pretty well discussed, however I would like to get your input in the context of my specific needs.
I am developing a realtime financial database that grabs stock quotes from the net multiple times a minute and stores it in a database. I am currently working with SQLAlchemy over MySQL, but I came across Redis and it looks interesting. It looks good especially because of its performance, which is crucial in my application. I know that MySQL can be fast too, I just feel like implementing heavy caching is going to be a pain.
The data I am saving is by far mostly decimal values. I am also doing a significant amount of divisions and multiplications with these decimal values (in a different application).
In terms of data size, I am grabbing about 10,000 symbols multiple times a minute. This amounts to about 3 TB of data a year.
I am also concerned by Redis's key quantity limitation (2^32). Is Redis a good solution here? What other factors can help me make the decision either toward MySQL or Redis?
Thank you!
Redis is an in-memory store. All the data must fit in memory. So except if you have 3 TB of RAM per year of data, it is not the right option. The 2^32 limit is not really an issue in practice, because you would probably have to shard your data anyway (i.e. use multiple instances), and because the limit is actually 2^32 keys with 2^32 items per key.
If you have enough memory and still want to use (sharded) Redis, here is how you can store space efficient time series: https://github.com/antirez/redis-timeseries
You may also want to patch Redis in order to add a proper time series data structure. See Luca Sbardella's implementation at:
https://github.com/lsbardel/redis
http://lsbardel.github.com/python-stdnet/contrib/redis_timeseries.html
Redis is excellent to aggregate statistics in real time and store the result of these caclulations (i.e. DIRT applications). However, storing historical data in Redis is much less interesting, since it offers no query language to perform offline calculations on these data. Btree based stores supporting sharding (MongoDB for instance) are probably more convenient than Redis to store large time series.
Traditional relational databases are not so bad to store time series. People have dedicated entire books to this topic:
Developing Time-Oriented Database Applications in SQL
Another option you may want to consider is using a bigdata solution:
storing massive ordered time series data in bigtable derivatives
IMO the main point (whatever the storage engine) is to evaluate the access patterns to these data. What do you want to use these data for? How will you access these data once they have been stored? Do you need to retrieve all the data related to a given symbol? Do you need to retrieve the evolution of several symbols in a given time range? Do you need to correlate values of different symbols by time? etc ...
My advice is to try to list all these access patterns. The choice of a given storage mechanism will only be a consequence of this analysis.
Regarding MySQL usage, I would definitely consider table partitioning because of the volume of the data. Depending on the access patterns, I would also consider the ARCHIVE engine. This engine stores data in compressed flat files. It is space efficient. It can be used with partitioning, so despite it does not index the data, it can be efficient at retrieving a subset of data if the partition granularity is carefully chosen.
You should consider Cassandra or Hbase. Both allow contiguous storage and fast appends, so that when it comes to querying, you get huge performance. Both will easily ingest tens of thousands of points per second.
The key point is along one of your query dimensions (usually by ticker), you're accessing disk (ssd or spinning), contiguously. You're not having to hit indices millions of times. You can model things in Mongo/SQL to get similar performance, but it's more hassle, and you get it "for free" out of the box with the columnar guys, without having to do any client side shenanigans to merge blobs together.
My experience with Cassandra is that it's 10x faster than MongoDB, which is already much faster than most relational databases, for the time series use case, and as data size grows, its advantage over the others grows too. That's true even on a single machine. Here is where you should start.
The only negative on Cassandra at least is that you don't have consistency for a few seconds sometimes if you have a big cluster, so you need either to force it, slowing it down, or you accept that the very very latest print sometimes will be a few seconds old. On a single machine there will be zero consistency problems, and you'll get the same columnar benefits.
Less familiar with Hbase but it claims to be more consistent (there will be a cost elsewhere - CAP theorem), but it's much more of a commitment to setup the Hbase stack.
You should first check the features that Redis offers in terms of data selection and aggregation. Compared to an SQL database, Redis is limited.
In fact, 'Redis vs MySQL' is usually not the right question, since they are apples and pears. If you are refreshing the data in your database (also removing regularly), check out MySQL partitioning. See e.g. the answer I wrote to What is the best way to delete old rows from MySQL on a rolling basis?
>
Check out MySQL Partitioning:
Data that loses its usefulness can often be easily removed from a partitioned table by dropping the partition (or partitions) containing only that data. Conversely, the process of adding new data can in some cases be greatly facilitated by adding one or more new partitions for storing specifically that data.
See e.g. this post to get some ideas on how to apply it:
Using Partitioning and Event Scheduler to Prune Archive Tables
And this one:
Partitioning by dates: the quick how-to

Ways to optimize MySQL table [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a big big table the size of table is in GB's around 130 GB. Every day data is dumped in the table.
I'd like to optimize the table... Can anyone suggest me how I should go about it?
Any input will be a great help.
It depends how you are trying to optimize it.
For querying speed, appropriate indexes including multi-column indexes would be a very good place to start. Do explains on all your queries to see what is taking up so much time. Optimize the code that's reading the data to store it instead of requerying.
If old data is less important or you're getting too much data to handle, you can rotate tables by year, month, week, or day. That way the data writing is always to a pretty minimal table. The older tables are all dated (ie tablefoo_2011_04) so that you have a backlog.
If you are trying to optimize size in the same table, make sure you are using appropriate types. If you get variable length strings, use a varchar instead of statically sized data. Don't use strings for status indicators, use an enum or int with a secondary lookup table.
The server should have a lot of ram so that it's not going to disk all the time.
You can also look at using a caching layer such as memcached.
More information about what the actual problem is, your situation, and what you are trying to optimize for would be helpful.
If your table is a sort of logging table, there can be several strategy for optimizing.
(1) Store essential data only.
If there are not necessary - nullable - columns in it and they does not be used for aggregation or analytics, store them into other table. Keep the main table smaller.
Ex) Don't store raw HTTP_USER_AGENT string. Preprocessing the agent string and store smaller data what you exactly want to review.
(2) Make the table as fixed format.
Use CHAR then VARCHAR for almost-fixed-length strings. This will be helpful for sped up SELECT queries.
Ex) ip VARCHAR(15) => ip CHAR(15)
(3) Summarize old data and dump them into other table periodically.
If you don't have to review the whole data everyday, divide it into periodically table (year/month/day) and store summarize data for old ones.
Ex) Table_2011_11 / Table_2011_11_28
(4) Don't use too many indexes for big table.
Too many indexes cause heavy load for inserting queries.
(5) Use ARCHIVE engine.
MySQL supports ARCHIVE ENGINE. This engine supports zlib for data compression.
http://dev.mysql.com/doc/refman/5.0/en/archive-storage-engine.html
It fits for logging generally(AFAIK), lack of ORDER BY, REPLACE, DELETE and UPDATE are not a big problem for logging.
You should show us what your SHOW CREATE TABLE tablename outputs so we can see the columns, indexes and so on.
From the glimpse of everything, it seems MySQL's partitioning is what you need to implement in order to increase performance further.
A few possible strategies.
If the dataset is so large, it may be of use to store certain information redundantly: keeping cache tables if certain records are accessed much more frequently than others, denormalize information (either to limit the number of joins or creating tables with less columns so you have a lean table to keep in memory at all times), or keeping summaries for the fast lookup of totals.
The summaries-table(s) can be kept in synch by either periodically generating them or by the use of triggers, or even combining both by having a cache table for the latest day on which you can calculate actual totals, and summaries for the historical data... will give you full precision while not requiring to read the full index. Test to see what delivers best performance in your situation.
Splitting your table by periods is certainly an option. It's like partitioning, but Mayflower Blog advises to do it yourself as the MySQL implementation seems to have certain limitations.
Additionally to this: if the data in those historical tables is never changed and you want to reduce space, you could use myisampack. Indexes are supported (you have to rebuild) and performance gain is reported, but I suspect you would gain speed on reading individual rows but face decreasing performance on large reads (as lots of rows need unpacking).
And last: you could think about what you need from the historical data. Does it need the exact same information you have for more recent entries, or are there things that just aren't important anymore? I could imagine if you have an access log, for example, that it stores all sorts of information like ip, referal url, requested url, user agent... Perhaps in 5 years time the user agent isn't interesting at all to know, it's fine to combine all requests from one ip for one page + css + javascript + images into one entry (perhaps have a different many-to-one table for the precise files), and the referal urls only need number of occurances and can be decoupled from exact time or ip.
Don't forget to consider the speed of the medium on which the data is stored. I think you can use raid disks to speed up access or maybe store the table in RAM but at 130GB that might be a challenge! Then consider the processor too. I realise this isn't a direct answer to your question but it may help achieve your aims.
You can still try to do partitioning using tablespaces or "table-per-period" structure as #Evan advised.
If your fulltext searching is failing may be should go to Sphinx/Lucene/Solr. External search engines can definitely help you to get faster.
If we are talking about table structure than you should use the smallest datatype if it possible.
If optimize table is too slow and it's true for the really big tables you can backup this table and restore it. Off course in this case you will need to get some downtime.
As bottom line:
if your issue concerning fulltext searching than before applying any table changes try to use external search engines.