I'm working on migration/integration of large on-premise Oracle monolithic app to cloud based Microservices. For a long time, microservices will need to be fed from and synchronized with the Oracle DB.
One of the alternatives is using Oracle Golden Gate for DB-to-DB(s) near-real-time replication. The advantage is that it seems to be reliable and resilient. The disadvantage is that it works on low-level CDC/DB changes (as opposed to app-level events).
An alternative is creating higher level business events from source DB by enriching data and then pushing it to something like Kafka. The disadvantage is that it puts more load on source DB, and requires durability on the source.
Anybody dealt with similar problems? Any advice is appreciated.
The biggest problem for us has been that legacy data is on the LAN, and our microservices are in the public cloud (in an attempt to avoid a "new legacy" hybrid cloud future).
Oracle Goldengate for Big Data can push change records as JSON to Kafka/Confluent. There's also the option to write your own handlers. You can find a lot of our PoC code in github.
As time has gone by, it became apparent the number of feeds was going to end up in the 300+ range, and we're now considering a data virtualisation + caching approach rather than pushing the legacy data to the cloud apps
Related
I am working on a network project and already using mysql as my backend. The project is coded in c++. When my data becomes pretty large, its takes a lot of time to retrieve data from mysql. Hence I was exploring other databases and came across neo4j. After reading a lot of stuff on internet about neo4j and I have few queries. My core requirement of my project is high performance and availability which I am not getting when my database becomes huge.
My questions:
I am a little hesitant in using neo4j since I have read on internet at places that it does not perform better than mysql. Is it true?
There are no c++ neo4j drivers and can be accessed only via rest apis. Will it make my project even slower as it will be now a http request and response?
can we run neo4j on solaris as my server for the project will be solaris?
Disclaimer: My answer might be biased since I'm working for Neo Technology. Nevertheless I will stay as objective as possible.
Regarding your questions:
It totally depends on your use case if a graph database are a relational database performs better. A graph database excels when you run local queries (e.g. "which are the friends of my friends). By local query I'm referring to a case where you start at one or more bound nodes and then traverse through the graph. For global queries (e.g. "what is the average age of people in the db") a graph database can perform at the same level a relational but will not be significantly faster. However, if your global queries need to do a lot of traversals, the benefit of a graph database will also be significant.
No, using your language's HTTP capabilities will not be slower compared to using a driver. Most of the drivers add some convenience layer(s) for creating the request and parsing the response, and maybe some caching.
Neo4j as a JVM based database can run on any JVM 7 enabled platform. However Neo Technology's support offering currently covers Linux, Windows and HP-UX. If you need commercial grade support for Solaris please get in touch with me or my colleagues directly.
I provide a solution that handles operations for brick and mortar shops. My next step is to provide analytics for my customers.
As I am in the starting phase I am hoping to find a free way to do it myself instead of using third party solutions. I am not expecting a massive scale at this point but I would like to get it done right instead of running queries off the production database.
And I am thinking for performance concerns I should run the analytics queries from separate tables in the same database. A cron job will run every night to replicate the data from the production tables to the analytics tables.
Is that the proper way to do this?
The other option I have in mind is to run the analytics from a different database (as opposed to just tables). I am using Amazon RDS with MySQL if that makes it more convenient?
It depends on how much analytics you want to provide.
I am a DWH manager and would start off with a small (free) BI (Business Intelligence) solution.
Your production DB and analytics DB should always be separate.
Take a look at Pentaho Data Integration (Community Edition) It's a free ETL tool that will help you get your data from your production to your analytics database and also can perform transformation.
check out some free reporting software like Jaspersoft to help you provide a Reporting Platform for customers (if that's what you want, otherwise just use Excel).
BI never wants to throw away data. If you think that your data in the analytics DB is gonna grow large (2TB +) don't use MySQL but rather PostgreSQL. MySQL does not handle big data well.
If you are really serious about this, read "The Datawarehouse Toolkit" by Ralph Kimball. That will set you up with some basic Data Warehouse knowledge.
Amazon RDS provides something call a Read-Replica. Which automatically performs replication and is optimised for reading.
I like this solution for its high convenience. Downside: its price-tag.
I am currently facing one problem which not yet figure out good solution, so hope to get some advice from you all.
My Problem as in the picture
Core Database is where all the clients connect to for managing live data which is really really big and busy all the time.
Feature Database is not used so often but it need some part of live data (maybe 5%) from the Core Database, But the request task to this server will take longer time and consume much resource.
What is my current solution:
I used database replication between Core Database & Feature Database, it works fine. But
the problem is that I waste a lot of disk space to store unwanted data.
(Filtering while replicate data is not work with my databases schema)
Using queueing system will not make data live on time as there are many request to Core Database.
Please suggest some idea if you have met this?
Thanks,
Pang
What you define is a classic data integration task. You can use any data integration tool to extract data from your core database and load into featured database. You can schedule your data integration jobs from real-time to any time-frame.
I used Talend in my mid-size (10GB) semi-scientific PostgreSQL database integration project. It worked beautifully.
You can also try SQL Server Integration Services (SSIS). This tool is very powerful as well. It works with all top-notch RDBMSs.
If all you're worrying about is disk space, I would stick with the solution you have right now. 100GB of disk space is less than a dollar, these days - for that money, you can't really afford to bring a new solution into the system.
Logically, there's also a case to be made for keeping the filtering in the same application - keeping the responsibility for knowing which records are relevant inside the app itself, rather than in some mysterious integration layer will reduce overall solution complexity. Only accept the additional complexity of a special integration layer if you really need to.
I've just started working on a project that will involve multiple people entering data from multiple geographic locations. I've been asked to prepare forms in Access 2003 to facilitate this data entry. Right now, copies of the DB (with my tables and forms) will be distributed to each of the sites, returned to me, and then I get to hammer them all together. I can do that, but I'm hoping that there is a better way - if not for this project, then for future projects.
We don't have any funding for real programming support, so it's up to me. I am comfortable with HTML, CSS, and SQL, have played around with Django a fair bit, and am a decently fast learner. I don't have much time to design forms, but they don't have to actually function for a few months.
I think there are some substantial benefits to web-based forms (primary keys are set centrally, I can monitor data entry, form changes are immediately and universally deployed, I don't have to do tech support for different versions of Access). But I'd love to hear from voices of experience about the actual benefits and hazards of this stuff.
This is very lightweight data entry - three forms attached to three tables, linked by person ID, certainly under 5000 total records. While this is hardly bank account-type information, I do take the security of these data seriously, so that's an additional consideration. Any specific technology recommendations?
Options that involve Access:
use Jet replication. If the machines where the data editing is being done can be connected via wired LAN to the central network, synchronization would be very easy to implement (via the simple Direct Synchronization, only a couple lines of code). If not (as seems the case), it's an order of magnitude more complex and requires significint setup of the remote systems. For an ongoing project, it can be a very good solution. For a one off, not so much. See the Jet Replication Wiki for lots of information on Jet Replication. One advantage of this solution is that it works completely offline (i.e., no Internet connection).
use Access for the front end and SQL Server (or some other server database) for the back end. Provide a mechanism for remote users to connect to the centrally-hosted database server, either over VPN (preferred) or by exposing a non-standard port to the open Internet (not recommended). For lightweight editing, this shouldn't require overmuch optimization of the Access app to get a usable application, but it isn't going to be as fast as a local connection, and how slow will depend on the users' Internet connections. This solution does require an Internet connection to be used.
host the Access app on a Windows Terminal Server. If the infrastructure is available and there's a budget for CALs (or if the CALs are already in place), this is a very, very easy way to share an Access app. Like #2, this requires an Internet connection, but it puts all the administration in one central location and requires no development beyond what's already been done to create the existing Access app.
For non-Access solutions, it's a matter of building a web front end. For the size app you've outlined, that sounds pretty simple for the person who already knows how to do that, not so much for the person who doesn't!
Even though I'm an Access developer, based on what you've outlined, I'd probably recommend a light-weight web-based front end, as simple as possible with no bells and whistles. I use PHP, but obviously any web scripting environment would be appropriate.
I agree with David: a web-based solution sounds the most suitable.
I use CodeCharge Studio for that: it has a very Access-like interface, lots of wizards to create online forms etc. CCS offers a number of different programming languages; I use PHP, as part of a LAMP stack.
I have a client that wants to use Filemaker for a few things in their office, and may have me building a web app.
The last time I used, or thought about, or even heard of, Filemaker was about 10 years ago, and I seem to remember that I don't want to use it as the back end of a sophisticated web app, so I am thinking to try to sell them on MySQL.
However, will their Filemaker database talk to MySQL? Any idea how best to talk them down from Filemaker?
You may have a hard time talking them out of FileMaker, because it was actually a pretty clever tool for making small, in-house database applications, and it had a very loyal user base. But you're right--it's not a good tool for making a web application.
I had a similar problem with a client who was still using a custom dBase IV application. Fortunately, Perl's CPAN archive has modules for talking to anything. So I wrote a script that exported the entire dBase IV database every night, and uploaded it into MySQL as a set of read-only tables.
Unfortunately, this required taking MySQL down for 30 minutes every night. (It was a big database, and we had to convert free-form text to HTML.) So we switched to PostgreSQL, and performed the entire database update as a single transaction.
But what if you need read-write access to the FileMaker database? In that case, you've got several choices, most of them bad:
Build a bi-directional synchronization tool.
Get rid of FileMaker entirely. If the client's FileMaker databases are trivial, this may be relatively easy. I'd begin by writing a quick-and-dirty clone of their most important databases and demoing it to them in a web browser.
The client may actually be best served by a FileMaker-based web application. If so, refer them to Google.
But how do you sell the client on a given choice? It's probably best to lay out the costs and benefits of each choice, and let the client decide which is best for their business. You might lose the job, but you'll maintain a reputation for honest advice, and you won't get involved in a project that's badly suited to your client.
We develop solutions with both FileMaker and PHP/MySQL. Our recommendation is to do the web app in a web app optimised technology like MySQL.
Having said that, FileMaker does have a solid PHP API so if the web app has relatively lightweight demands (e.g. in house use) then use that and save yourself the trouble of synchronisation.
FileMaker's ESS technology let's FileMaker use an SQL db as the backend data source, which gives you 2 options:
Use ESS as a nice tight way to synchronise right within FileMaker - that way you'd have a "native" data source to work with within the FileMaker solution per se.
Use ESS to allow FileMaker to be used as a reporting/data mining/casual query and edit tool directly on the MySQL tables - it works sweet.
We've found building a sophisticated application in FileMaker with ESS/MySQL backend to be very tricky, so whether you select 1 or 2 from above depends on how sophisticated and heavy duty that FileMaker usage is.
Otherwise, SyncDek has a good reputation as a third party solution for automating Synchronisation.
I've been tackling similar problems and found a couple of solutions that emk hasn't mentioned...
FileMaker can link to external SQL data sources (ESS) so you can use ODBC to connect to a MySQL (or other) database and share data. You can find more information here. we tried it and found it to be pretty slow to be honest
Syncdek is a product that claims to allow you to perform data replication and data transmission between Filemaker, MySQL and other structured sources.
It is possible to use Filemaker's Instant Web Publishing as a web service that your app can then push and pull data through. We found a couple of wrappers for this in python and php
you can put a trigger in the FileMaker database so that every time a record is changed (or part of a record you are interest in) you can call a web service that updates a MySQL or memcached version of that data that your website can access.
I found that people like FileMaker because it gives them a very visual interface onto their data - it's very easy to make quite large self-contained applications without too much development knowledge. But, when it comes to collaboration with many users or presenting this data in a format other than the FileMaker application we found performance a real problem.