spring batch recommended tables indexes - mysql

I'm looking for recommended indexes for spring batch tables,
specially when using the API:
jobExplorer.findRunningJobExecutions(jobName);
any one?

Assuming you are using the JdbcJobExecutionDao implementation for the SimpleJobExplorer then the only columns involved in the query for the where clause are as follows:
JOB_EXECUTION.JOB_INSTANCE_ID
JOB_EXECUTION.END_TIME
JOB_INSTANCE.JOB_INSTANCE_ID
JOB_INSTANCE.JOB_NAME
And the order by is using: JOB_EXECUTION.JOB_EXECUTION_ID
You can take a look at the source for JdbcJobExecutionDao for the actual queries. As for your question about what indexes to create. JOB_INSTANCE_ID seems to be a good candidate in this case.

Related

MongoDB or MySQL database

I have a question about making the decision whether to use MySQL database or Mongo database, the problem with my decision is that I am highly depending on these things:
I want to select records between two dates (period)
However is this possible?
My Application won't do any complex queries, just basic crud. It has Facebook integration so sometimes I got to JOIN the users table at the current setup.
Either DB will allow you to filter between dates and I wouldn't use that requirement to make the decision. Some questions you should answer:
Do you need to store your data in a relational system, like MySQL? Relational databases are better at cross entity joining.
Will your data be very complicated, but you will only make simple queries (e.g. by an ID), if so MongoDB may be a better fit as storing and retrieving complex data is a cinch.
Who and where will you be querying the data from? MySql uses SQL for querying, which is a much more well known skill than mongo's JSON query syntax.
These are just three questions to ask. In order to make a recommendation, we'll need to know more about your application?
MySQL(SQL) or MongoDB(NoSQL), both can work for your needs. but idea behind using RDBMS/NoSQL is the requirement of your application
if your application care about speed and no relation between the data is necessary and your data schema changes very frequently, you can choose MongoDB, faster since no joins needed, every data is a stored as document
else, go for MySQL
If you are looking for range queries in MongoDB - yes, Mongo supports those. For date-based range queries, have a look at this: http://cookbook.mongodb.org/patterns/date_range/

Joining Tables Based on Foreign Keys

I have a table that has a lot of fields that are foreign keys referencing a related table. I am writing a script in PHP that will do the db queries. When I query this table for its data I need to know the values associated with these keys not the key.
How do most people go about this?
A 101 way to do this would be to query this table for its data including the foreign keys and then query the related tables to get each key's value. This could be a lot of queries (~10).
Question 1: I think I could write 1 query with a bunch of joins. Would that be better?
This approach also requires the querying script to know which table fields are foreign keys. Since I have many tables like this but all with different fields, this means writing nice generic functions is hard. MySQL InnoDB tables allow for foreign constraints. I know the database has these set up correctly.
Question 2: What about the idea of querying the table and identifying what the constraints are and then matching them up using whatever process I decide on from Question 1. I like this idea but never see it being used in code. Makes me think its not a good idea for some reason. I would use something like SHOW CREATE TABLE tbl_name; to find what constraints/relationships exist for that table.
Thank you for any suggestions or advice.
You talk about writing "nice generic functions", but I think you are thinking a little TOO generic here.
Personally I would just write a query with a bunch of joins in it. If you want to abstract all that join logic away and not have to worry about it, then you should probably look at using an ORM instead of writing the SQL directly.
At some level, the system should run queries using joins, whether those queries are written explicitly by the application programmer or generated automatically by the data access layer. Option 1 is definititely better than the naive option. As for some other query creation options (by no means an exhaustive list):
You could abstract out all database operations, much as PDO abstracts out connecting and query operations (i.e. preparing & executing queries). Use this to get table metadata, including foreign keys, which could then be used to construct queries automatically.
You could write object specifications in some other format (e.g. XML) and a class that would use that to both generate PHP classes and database tables. You find this more in Enterprise applications than smaller projects. This option has more overhead than others, and thus isn't suitable if you only have a few classes to model. Occurrences of this option might also be a consequence of Conway's Law, which I first heard as Richard Fairly's variant: "The structure of a system reflects the structure of the organization that built it."
You could take a LINQ-like approach. In PHP, this would mean writing numerous functions or methods that the application programmer can chain together which would create a query. The application programmers are ultimately responsible for joining tables, though they never write a JOIN themselves.
Rather than thinking about how to create the queries, a better problem approach is to think about how to interface the DB and the application. This leads to patterns such as Data Mapper and Active Record that fall into the category of Object-Relational mapping (ORM). Note that some patterns (such as AR), other ORM techniques and even ORM itself have issues of their own. Any of the above query creation options can be used in the implementation of a data access pattern.
The problem with using SHOW CREATE TABLE is it doesn't work with most (all?) other RDBMSs. If you want to marry your app to MySQL, go ahead, but the decision could haunt you.
What kind of record counts are you working with, both in the main data table(s) and the lookup tables?
As a general rule, you should join the lookup tables to the main table. If you have an excessive amount of joins and there aren't many UDFs involved here, there's a pretty good chance the table should be normalized a bit more. If the normalization is fine and the main data table is really wide, you could still split the table to multiple tables with 1:1 relationships so as to separate the frequently accessed columns from the infrequently accessed columns.
MySQL includes support for the ANSI catalog INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS. You could use that to gather information on the FK relationships that exist.
Also, if there are combinations of joins you use frequently, create a views or stored procedures based on those common operations.

multiple tables sharding with mysql

I'm making a GPS app that will deals with 200 millions records in a table. My initial thought is to divide the table into multiple tables like position_1, position_2, ... and split the data.
My question is: does it have any performance gain with MySQL(innodb) ?
The real problem is to create the relevant indexes that match the queries.
The InnoDB table size itself (see the InnoDB specific chapter) shouldn't be a problem.
As long as the indexes are accurate, the application development and maintenance using a single table will be much easier.
You don't need to create multiple tables, but you might get a performance enhancement by partitioning the single table across different disks. MySQL does actually have support for practitioning.
http://dev.mysql.com/doc/refman/5.1/en/partitioning-overview.html
I'd suggest you try to measure performance with and without paritioning before you decide to use it.

MySQL InnoDB Text Search Options

Knowing full well that my InnoDB tables don't support FULLTEXT searches, I'm wondering what my alternatives are for searching text in tables ? Is the performance that bad when using LIKE ?
I see a lot of suggestions saying to make a copy of the InnoDB table in question in a MYISAM table, and then run queries against THAT table and match keys between the two and I just don't know that that's a pretty solution.
I'm not opposed to using some 3rd party solution, I'm not a huge fan of that though. I'd like to explore more of what MySQL can do on its own.
Thoughts ?
If you want to do it right you probably should go with Lucene or Sphinx from the very start.
it will allow you to keep your table structure.
you'll have a huge performance boost (think ahead)
you'll get access to a lot of fancy search functions
Both Lucene and Sphinx scale amazingly well (Lucene powers Wikipedia and Digg / Sphinx powers Slashdot)
Using LIKE can only use an index when there is no leading %. It will be a huge performance hit to do LIKE '%foo%' on a large table. If I were you, I'd look into using sphinx. It has the ability to build its index by slurping data out of MySQL using a query that you provide. It's pretty straightforward and was designed to solve your exact problem.
There's also solr which is an http wrapper around lucene, but I find sphinx to be a little more straightforward.
I as others have i would urge use of Lucene, Sphinx or Solr.
However if these are out and your requirements are simple I've used the steps here to build simple search capability on a number projects in the past.
That link is for Symfony/PHP but you can apply the concepts to any language and application structure assuming there is an implementation of a stemming algorithm available. However, if you dont use a data access pattern where you can hook in to update the index when a record is updated its not as easily doable.
Also a couple downsides are that if you want a single index table but need to index multiple tables you either have to emulate referential integrity in your DAL, or add a fk column for each different table you want to index. Im not sure what youre trying to do so that may rule it out entirely.

Using Linq-to-SQL preload without joins

Like many people, I am trying to squeeze the best performance out of my app while keeping the code simple and readable as possible. I am using Linq-to-SQL and am really trying to keep my data layer as declarative as possible.
I operate on the assumption that SQL calls are the most expensive operations. Thus, I try to minimize them in quantity, but try to avoid crazy complex queries that are hard to optimize.
Case in point: I am using DataLoadOptions with my DataContext -- its goal is to minimize the quantity of queries by preloading related entities. (Aka, eager loading vs lazy loading.)
Problem: Linq uses joins to achieve the goal. As with everything, it's a trade-off. I am getting fewer queries, but those joined queries are more complex and expensive. Going into SQL Profiler makes this clear.
So, I'd like an option in Linq to preload without joins. Is this possible? Here's what it might look like:
I have a Persons table, an Items table, and a PersonItems table to provide a many-to-many relationship. When loading a collection of Persons, I'd like to have all their PersonItems and Items eagerly loaded as well.
Linq currently does this with one large query, containing two joins. What I'd rather it do is three non-join queries: one for Persons, one for all the PersonItems relating to those Persons, and one for all Items relating to those PersonItems. Then Linq would automagically arrange them into the related entities.
Each of these would be a fast, firehose-type query. Over the long haul, it would allow for predictable, web-scale performance.
Ever seen it done?
I believe what you describe where three non-join queries are done is essentially what happens under the hood when a single join query is performed. I could be wrong but if this the case the single query will be more efficient as only one database query is involved as opposed to three. If you are having performance issues I'd make sure the columns you are joining on are indexed (you should see no table scans in SQL profiler). If this is not enough you could write a custom stored procedure to get just the data you need (assuming you don't need every column of every object, this will allow you to make use of index seeks which are faster than index scans), or alternately you could denormalise (duplicate data across your tables) so that no joining occurs at all.