Oracle 11g for a MySQLer, concept of database - mysql

I come from a strong experience in MySQL, and I am now starting with Oracle. But I find really difficult to understand what a DATABASE is in Oracle, given that they use similar concepts which I am struggling to differentiate. In mysql, there is a simple concept of "database" instead of a mixture of
SCHEMA concept (User's woprkspace logically divided by TABLESPACES)
TNS and SID/SERVICES concept
CONNECTION concept (in both ODBC definition and SQLDeveloper)
I won't ask a pure definition of them as I am still reading, but just some guidance on how can I map a mysql database in the closest Oracle concept.

This is the information I can give you coming from the perspective of a developer. I don't know huge amounts about Oracle, but I have done some fairly significant work with deploying to it for some applications that are now in production.
Database
A database, in Oracle terms, is a group of files that live on disk and are managed as a cohesive unit. The database contains almost everything: logins, roles, tables, indexes, temporary space, transaction logs, and so on. Creating one is a nontrivial task in Oracle. It basically requires direct access (as in SSH or Windows Remote Desktop) to the machine. It's common for a DBA to create one during installation and for that to be the only one the server ever hosts. Unlike in MySQL, PostgreSQL, and SQL Server, you can't really use this level for basic grouping. E.g., giving each developer their own database is uncommon because of the overhead in recreating it.
Schema
Oracle schemas conflate two purposes: users and namespaces.
Each schema is a user, and it can be associated with credentials (a password, a user in Active Directory). Note that all accounts are database specific; there is no way to create a user that can log into multiple databases (aside from pointing both databases at the same LDAP server or otherwise involving some external service).
The schema also acts as a namespace that contains objects (e.g., tables, views, procedures, and indexes), and the schema name can be used explicitly to qualify exactly which object you're trying to refer to. For example, if I say MYOWNER.MYTABLE, Oracle will look for the MYTABLE table owned by MYOWNER. If you need multiple copies of all the same objects, this is the easiest level to group them in, which makes them the best level for having per developer copies of the database.
It is common to divide the two concepts manually: a schema can be locked out of logging in, and permissions can be granted to another user on its objects. This is something of a hassle, though, since there's no way to grant permissions across an entire schema; each object must be granted explicitly to some user or role. There's also no way to force users to create objects in a specific schema besides their own; permissions can only be granted to either create objects in the user's own schema or globally in any schema.
Complete aside: in PostgreSQL and SQL Server, schemas are only namespaces, not users.
Tablespace
Tablespaces are sets of files on disk that contain everything you need to store, including both data and the metadata (such table definitions). A single database can use multiple tablespaces, and different objects within a schema can even be on different tablespaces. A tablespace can be one or many files, but they're managed as one logical unit. Each schema has a default tablespace for its objects if a tablespace isn't specified when creating the object. Sharing them between databases is somewhere between impossible and unheard of.
In practice, it's common to not even bother with tablespaces and just leave the default configuration alone. The default is one tablespace named USERS with one file, and it's the default tablespace for all schemas in the database. If you change these at all, you usually set a default for each schema and then never think about it again until disk space becomes an issue.
Instance
You didn't ask about these specifically, but you'll need to understand them before we can talk about connecting to the database.
An instance is the actual process running on the server that listens for connections. Like databases, these require direct access to the database server to set up. You can have multiple or a single one on the server. It's common to have one per database.
An instance can be identified two ways: an SID or a service name. The SID identifies a single instance, while the service name is an alias that can refer to several instances. The details of how that works are usually unimportant; just know that you need to know of them to connect.
Connecting
To connect from a client, you need a connect descriptor. This is a jumbled string containing the host, port, and either SID or service name. They look like this, for example: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=myoracleserver)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=orclservice))). They can get more complicated, but that's the basic form. To use an SID instead of a service name, you would replace SERVICE_NAME=orclservice with SID=orclinstance. There's also a newer, more compact format called "EZ connect" that looks like this instead: myoracleserver:1521/orclservice; it only supports the basic parameters.
TNS is short for "Transparant Network Substrate," and it consists of the entire networking stack that is used to communicate with the database. You virtually never need to concern yourself with it as a whole.
What you encounter often is TNS names. TNS names are aliases to the connect descriptors. They're stored in a plain text file on the client machine, and they're typically global to the entire machine. Here's an example mapping that you mind find in the file: mydatabase=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=myoracleserver)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SID=orcl))). In my experience, most of the time you can actually avoid bothering with TNS names entirely and just use the connect descriptor directly.
A connect identifier is anything that can stand in for a connect descriptor. It can be a full connect descriptor, an EZ connect descriptor, a TNS name, or several other things. But generally, they identify a server and the particular database on it that you want to connect to.
With all that in mind, connections become a little more straightforward. Conceptually, they're pretty much the same as in other database. The thing that might be confusing about them is that you connect as a schema, as described earlier. The "username" is the schema name, and the schema can have a password or some other form of authentication associated with it. The connection string differs according to the client software, much like in any other database. For SQL*Plus (Oracle's command line client), connection strings look like this: [USERNAME]/[PASSWORD]#[connect identifier]. So if your user is MY_SCHEMA, the password is PASS, and the server is like above, it might look like
MY_SCHEMA/PASS#(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=myoracleserver)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SID=orclinstance)))
For a .NET application, it might look like
Data Source=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=myoracleserver)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SID=orcl)));User Id=MY_SCHEMA;Password=PASS
which is pretty similar to any other database. Note that anywhere you see that nasty server information, you could replace that with any connect identifer (such as a TNS name).
As far as SQL Developer is concerned, a "connection" is really just a saved connection string. ODBC connects like any other database; you just need the right connection string and drivers.
Drivers
The drivers can be a pain point in Oracle, depending on language. I believe Java has some decent stand alone clients, but other languages generally depend on the binary version. The binary version does have an installer that puts the binaries on PATH, but the installer is pretty difficult to use and best avoided. When I can, I avoid installing the client and make use of what's called "instant client". Usually, if you can get the instant client binaries in a place where the app can find them, they just work. If not, then it's preferable to just prepend PATH in memory for your application than to modify it globally for your machine.
If you happen to be developing using .NET, use the ODP.NET provider on NuGet from Oracle. It's written in full .NET, eliminating the need to deal with native binaries.
Summary
So in short:
A database is part of the server set up
A schema is both a user and how you divide your database
A tablespace is the physical files that hold the database
TNS names are just a naming convenience on the client side
SID/Service Name are just names used when connecting
I find this arrangement far too complex, personally.

Related

How to store data of different applications in same local MySQL instance if both applications have multi-DB architecture?

Application 1: Suppose I have a Twitter like application. Hence I need to use multiple databases/schema (suppose one to store user info, suppose one for user logging purpose, etc)
Application 2: Suppose I have a blog with logically separated DBs needed ( suppose one to store user info, suppose one for user logging purpose, etc ).
How to use same MySQL instance as the datastore for both. I mean, since each has multiple similar DBs , there are chances of getting confused with names of databases or tables unless I keep long names like twitter_users and blog_users.
Any effective solution within MySQL?
a other way is to use MaxScale as DB Proxy. There is rewrite Engine. There you can configure a rewrite for schema name for one application. The benefit is that you can use a single MySQL/MariaDB instance and configure the hole memory for it.

Storing MySQL credentials in a MySQL database

This is a similar question to "Storing MS SQL Server credentials in a MySQL Database"
So, in theory, imagine I have 1 MySQL server. I have a "master" database, and then X number of other generic databases. What im looking for, is a way of using an app (for arguments sake, lets say a web app, running on php) to first access the master database. This database then needs to tell the app which database to connect to - in the process, giving it all the credentials and username etc.
How is the best way around this?
The three ideas I have so far
Store the credentials in the master database for all the other databases. These credentials would of course be encrypted in some way, AES probably. The app would get the encrypted credentials, decrypt, connect.
Store the credentials elsewhere - maybe a completely separate server. When the master database is accessed, it returns some sort of token, which can be used to access the credential storage. Again, encrypted via AES.
Using some sort of system that I am not aware of to do exactly this.
Not doing this at all, and come up with a completely different approach.
To give a little example. "master" would contain a list of clients. Each client would contain it's own separate database, with it's own permissions etc.
I've had no reason to do this kind of thing myself but your first two ideas sound good to me and (as long as you include server address) not even necessarily separate ideas (could have some clients on the server with master, and some elsewhere) the client logic won't need to care. The only issue I can see is keeping the data in the "master" schema synced with the server's security data. Also, I wouldn't bother keeping database permissions in the master schema as I would think all clients have the same permissions, just specific to their schema. If you have "permissions" (settings) that limit what specific clients can do (perhaps limited by contract/features paid for), I would think it would be much easier to keep those in that clients' schema but where their db user cannot change data.
Edit: It is a decent idea to have separate database users in this kind of situation; it will let you worry less about queries from one user's client inadvertently (or perhaps maliciously) modifying another's (client account should only have permissions to access their own schema.) It would probably be a good idea to keep the code for the "master" coordination (and connection) somewhat segregated from the client code base to prevent accidental leaking of access to that database into the client code; even if encrypted you probably don't want them to even have any more access than necessary to your client connection info.
I did something like this not long ago. It sounds like you're trying to build some kind of one-database-per-tenant multi-tenant system.
Storing encrypted credentials in a directory database is fine, since there's really no fundamentally different way to do it. At some point, you need to worry about storing some secret (your encryption key) no matter what you do.
In my use case, I was able to get away with a setup where the directory just mapped tenants to db-hosts. The database name and credentials for each tenant were derived from the tenant's identifier (a string). So something like, given a TenantID T:
host = whatever the directory says.
dbname = "db_" + T
dbuser = T
dbpass = sha1("some secret string" + T)
From a security standpoint, this is no better (actually a bit worse) than storing AES encrypted credentials in the directory database, since if someone owns your app server, they can learn everything either way. But it's pretty good, and easy to implement.
This is also nice because you can think about extending the idea a bit and get rid of the directory server entirely and write some function that maps your tenant-id to one of N database hosts. That works great until you add or remove db servers, and then you need to handle shuffling things around. See how memcache works, for example.
You can use Vault to do this in much systematic way. In fact this is a strong use-case for this.
Percona has already written a great blog on it,

How to store sensitive data of different clients in SQL server?

I work at a small company and I am trying to figure out a solution for storing sensitive data of multiple clients in Microsoft SQL server. Actually, I feel like this is a general database question and it is not specific to MSSQL.
Until now we have been using a proprietary database where the client data is stored as db files (flat files) in the client’s root directories in the file system. So the operating system permissions guarantee that the application used by client X can never fetch data from client Y’s database. Please note that there is no database server/instance/engine here…
However, for my project I want to use SQL database. But the security folks are expressing concerns over putting data of different clients on a single database.
One option is to create separate database instances for different clients. However, I am not sure if this idea is scalable.
So my questions are:
1) Is there any mechanism in MSSQL that enables you to store databases ‘separately’ in different files used by the SQL server?
2) Let’s say I have only one database instance where I have databases of client X and client Y. How can I make sure that client X’s requests can never (accidentally) get misdirected to client Y’s database? I do not want to rely on some parameter in my code to determine which database to fetch from! :)
So, is there any solid authentication scheme to guarantee that my queries could not be misdirected to fetch from an incorrect client table?
I think this is a very common problem and there has to be a good solution for this. What are other companies doing?
Please let me know if there are any good articles to read up on this.
Different databases are always stored in different files in SQL Server so you don't even have to do anything special for this. However, NTFS permissions will not help you in this case as the clients aren't ever accessing the files directly on disk.
One possible solution in SQL Server is to create separate sets of Windows user IDs and map those to separate SQL Logins for each customer. You could then only assign those logins access to the appropriate databases. For example, if you were hosting web sites for client X and client Y, you would set up the connection string(s) in the web.config for client X's web site to use the appropriate login(s) for client X's database. Vice versa for client Y. This guarantees that no matter what (barring a hard-coded login), the code from client X's site will never access client Y's database.
You can have 32,000 databases on a single instance of SQL server and having separate databases enables a number of improved serviceability scenarios (such as restoring a single customer's DB in case of a data problem without affecting all of your other customers).
http://technet.microsoft.com/en-us/library/ms143432.aspx

How to avoid data redundancy when copying between different DBMS?

I'm planning to create an VB.net application for retrieving data from a database (MS Access) and store it to a web server (MySQL data base). I really have confusion in my mind. I'm planning to use task scheduler so that the program will automatically run. I'm planning to set the time every 5 minutes.
How can I avoid the redundancy of data?
For example, I'm planning to get the sales for 5 minutes, after 5 minutes I will do it again. I think there will be redundancy in that case. I would like to ask your ideas about this scenario: how would you handle it?
If at all possible you should avoid using two databases in a situation like this.
Look for information on the linked table manager -- the data that Access uses doesn't have to be stored in Access.
http://www.mssqltips.com/sqlservertip/1480/configure-microsoft-access-linked-tables-with-a-sql-server-database/
If you have to do this, then see about using/upgrading to Access 2010 and use data macros (triggers), to put the new/changed data into temp tables that you clear out once you've copied the data over.
In a comment you said "i dont have any idea about how to replace the native tables with ODBC".
Is that the only obstacle which prevents you consolidating the data into one set in MySQL? If so, try this suggestion for setting ODBC links to MySQL tables.
Install an ODBC driver for MySQL, if you don't have one already. The latest version is available here: Download Connector/ODBC
Create a DSN (Data Source Name) for your MySQL database from the Windows ODBC Data Source Administrator.
Create a new Access database and use the DSN to create links with guidance from the web page link #jmoreno provided.
If the Access names of the linked tables are different than the names you originally used for the native Access tables, change them to match those original names.
Then you can import your forms, queries, reports, etc. from the old Access application. Ideally everything will just work, since Access will find the table names it needs and won't care that they are external instead of native tables. However you many need to resolve any data type incompatibilities between Access and MySQL.
You would need the MySQL ODBC driver on each machine where the Access application is used. Personally I would prefer to deal with that rather than the challenges of synchronizing between separate Access and MySQL data stores. (YMMV)
When you're ready to deploy, you can convert the ODBC links to DSN-less connections so the client machines wouldn't need to each have the DSN configured. See Using DSN-Less Connections by Doug Steele, Access MVP, for detailed instructions.
You will need to think very carefully about how you identify the data which has changed since the last synchronization cycle. If every row of data has a 'last updated' timestamp (that is indexed) then you could write a process that selected the recently updated rows from each table in turn. That's apt to be a bit heavy on the originating database (MS Access), plus you still have to identify the corresponding row to replace (where replacement is required) in the MySQL database. Of course, you can put different tables on different change schedules. For example, the table of US states probably doesn't change once a year, but your customer orders tables (or SO questions and answers tables) may change a lot in five minutes.
Some DBMS have alternative mechanisms, especially for working between copies of themselves. Some DBMS also provide a mechanism that is sometimes called 'changed data capture' (CDC) that allows you to get the changed data. Sometimes, in DBMS where you have a 'transaction log' or 'logical log' (but not CDC or something similar), you can 'mine' the log files (or log backups) to find the changes. However, the logs are typically optimized for the DBMS internal recovery processes, not for your use.
Well, obviously you will have to keep track of data items (may be in a different metadata space/datastore) that you have already processed to avoid the redundancy. The metadata should be used to filter out records that have been processed from the source. The logic and what needs to be in the metadata would depend on the exact use case here.

Securing tables vs databases on a mutitool web site with confidential information

I am working on a site that multiple projects will be using to enter confidential subject information for various research projects. Project data access will be limited to specific users and tools. But certain core data will be referenced in and joined to the project tables (username, project meta-data, etc). The current plan is that each project will have mysql users with any combination of Select, Update, or Insert rights as needed. Plus an overall project Adminstrator user that can alter the shape of the project's tables that will only be used in phpadmin. We are using a Database object with some backtrace logic to determine what object passed it connection credentials and will only allow that connection to be used by the originating object (not impossible to get around by a dedicated programmer, but would throw up red flags in code review). And we are following standard procedure of moving the config out of the web root and keeping all credentials in config files instead of code. Of course there is an overall administrator but that has so many access rules and it's password is ludicrously long (we have a static yubikey + 10 char password).
What I want to know is whether to separate project data out to their own databases or should I put them in tables that have access limited to certain accounts? Setting user permissions on the Database or Table level seem to be about equitable in difficulty. There will be joins and other such operations between the core tables (meta-data usually) and the protected data. But joining across databases on the same server works fine, but I am uncertain about how the performance of intra-database joins compare to inter-database joins.
It doesn't matter if you put them in the same database or in different ones. You can implement a good (or a bad) security concept with both alternatives.
if you are using one database and you put data for different users in one table you will have to implement a lot of the access control in you application.
if you have separated the data completely in different tables (or even databases) you can easily use the access control of mysql. In this case I would go with separated databases, because it is more convenient when setting up a backup system or if you want to scale your application over more than one machine. But since you want to join across different databases you gonna lose some of these advantages so it doesn't really matter.