Hadoop recommends prime number of map / reduce tasks?

Hadoop recommends prime number of map / reduce tasks? - configuration

I saw these in a template configuration file:
<property>
<name>mapred.map.tasks</name>
<value>2</value>
<description>The default number of map tasks per job. Typically set
to a prime several times greater than number of available hosts.
Ignored when mapred.job.tracker is "local".
</description>
</property>
...
<property>
<name>mapred.reduce.tasks</name>
<value>1</value>
<description>The default number of reduce tasks per job. Typically set
to a prime close to the number of available hosts. Ignored when
mapred.job.tracker is "local".
</description>
</property>
I couldn't find any other reference, neither online nor in the Hadoop O'Reilly book, as to why these should be prime. Anyone have any ideas?
Thanks.

See HADOOP-5519; this is no longer in the configuration file as there was no (or little) reason for it.
I haven't seen it for at least two versions, and JIRA says it was resolved a couple of years ago.

Related

Why numUnclosedOrphanedConnections in c3p0 increase?

I am using c3p0 (0.9.1.2) version and after a hour or so I see numUnclosedOrphanedConnections keep increasing slowly like 1 per hour. c3p0 docs said that
numUnclosedOrphanedConnections will only be non-zero following a call
to softReset(). It represents the number of Connections that were
checked out when a soft reset occurred and were therefore silently
excluded from the pool, and which remain unclosed by the client
application.
Why does c3p0 doing soft reset? My c3p0 settings is like
initialPoolSize=1
minPoolSize=1
maxPoolSize=100
maxIdleTime=60
checkoutTimeout=5000
testConnectionOnCheckin=true

Thanks Steve for helping me fixing it. This is how I did it.
Enable Debug level logging for c3po:
<logger name="com.mchange" additivity="false">
<level value="DEBUG" />
<appender-ref ref="C3p0Appender" />
</logger>
c3p0 settings:
debugUnreturnedConnectionStackTraces=true
# 30 sec is enough for me but you should change it for your case
unreturnedConnectionTimeout=30
And the keyword to search inside c3p0 log file is: "Overdue resource check-out "
This logging is only enable in the trunk version of c3p0. It should be there in pre6 release.

JBoss AS7: logging with logback

I would like to use slf4j+logback for logging on an JBossAS7.
Additionaly I have to solve the following requirements:
I need to share one logback configuration / context within multiple deployed applications/EARs
I need to change the logback configuration on runtime without a redeploy/restart of the EARs
make (as much as possible) log entries of the JBoss Server visible inside my logging configuration (e.g. deployment logs, etc...)
What I know now, is that JBoss uses its own logging layer. For architectural reasons, I can not use this. I would like to stick with only SLF4J as Logging-API and Logback as framework.
I would be happy to get some hints, how this could be solved.
Regards,
Lars

Lars,
The only way I can think of to do this would be to write a custom handler. While it's not very well documented at the moment, you can create custom java.util.logging.Handler's. You could write a wrapper in a sense around around the logback's configuration. I think they have a BasicConfigurator or something like that.
You register a custom handler like so:
<custom-handler name="logbackHandler" class="org.jboss.LogbackHandler" module="org.jboss.logback">
<level name="DEBUG"/>
<properties>
<property name="nameOfASetterMethod" value="the value to set" />
</properties>
</custom-handler>
<root-logger>
<level name="INFO"/>
<handlers>
<handler name="CONSOLE"/>
<handler name="FILE"/>
<handler name="logbackHandler"/>
</handlers>
</root-logger>
That said there is probably no real need to do that. The application server logger will log the messages even if you are logging through a different façade. You can set-up different file handlers if you want to write to your own files.
I realize logging in JBoss AS7 could really use some better documentation. I do plan on updating that when I find the time :-) And really I just need to make the time.

I am pretty sure that you can use slf4j+logback for your own applications within JBoss and completely bypass its logging. JBoss will continue logging all of its own log messages to its own logs, but your software will not connect to jboss-logging at all and will have its own logs. I have tried this under JBoss 6; we have not yet tried JBoss 7, so things may be different there, but I doubt it. Just make sure slf4j and logback jars are in your applications' classpaths, and you should be good.
If you search through the System properties available to you, you will find some jboss.* properties that may be useful in your logback configuration for finding a place to put your log files.
Personally, I wish JBoss would switch to using slf4j.

Relationships being deleted on associated objects in Hibernate+Gilead application

Edit 5/11/2011:
I guess it's a bit worse than what's below; in my deployed QA instance, if I just refresh the main dashboard a number of times, eventually my user's many-to-many group associations are deleted. At this point, there are only select statements being called on the server side; hopefully I'm getting it narrowed down with these latest tests.
Original:
Hi all. I'm having a problem with a rather complex object; problem is as follows: when I send the object from the client to server to be saved, it is seemingly randomly wiping out many-to-many relationships on associated objects. Worse yet, I am not able to reproduce the problem myself, after roughly two months of being aware of the problem. I have the app out for testing with a QA group; they're using the program daily, double entering in the new and legacy applications. The problem crops up as much as three times a day.
I'll do my best to provide as much detail as I can, and greatly appreciate anyone taking a look!
The app framework is GWT 2.1 + Gilead + Hibernate 3 + MySQL InnoDB. I'm letting Hibernate handle any cascading etc, so none is defined in the DB, although all foreign keys are set in the DB.
Here's some excerpts from the mappings:
<hibernate-mapping>
<class name="com.example.domain.Close" table="CLOSE">
<many-to-one name="updateUser"
class="com.example.domain.User"
column="LAST_UPDATE_USER"/>
</class>
</hibernate-mapping>
<hibernate-mapping>
<class name="com.example.domain.User" table="USER" batch-size="25">
<set name="groups" table="USER_GROUP" lazy="true" batch-size="25">
<key column="USER_ID"/>
<many-to-many column="GROUP_ID" class="com.example.domain.Group"/>
</set>
</class>
</hibernate-mapping>
<hibernate-mapping>
<class name="com.example.domain.Group"
table="GROUP" batch-size="25">
<set name="users" table="USER_GROUP" lazy="true" inverse="true">
<key column="GROUP_ID"/>
<many-to-many column="USER_ID" class="com.example.domain.User"/>
</set>
<set name="permissions" table="PERMISSION_GROUP" lazy="true" inverse="true">
<key column="GROUP_ID"/>
<many-to-many column="PERMISSION_ID"
class="com.example.domain.Permission"/>
</set>
<hibernate-mapping>
<class name="com.example.domain.Permission"
table="PERMISSION">
<set name="groups" table="PERMISSION_GROUP" lazy="true">
<key column="PERMISSION_ID"/>
<many-to-many column="GROUP_ID"
class="com.example.domain.Group"/>
</set>
</class>
</hibernate-mapping>
Saving the object is simple a call to saveOrUpdate():
Session session = gileadHibernateUtil.getSessionFactory()
.getCurrentSession();
session.beginTransaction();
try {
session.saveOrUpdate(close);
} catch (Exception e) {
e.printStackTrace();
session.getTransaction.rollback();
}
session.getTransaction.commit();
return close;
The Close 'updateUser' is an object loaded when the user logs in. It is loaded with the associated groups and permissions so the system can grant/deny access to app modules. I do
close.setUpdateUser(exampleApp.getUser());
before sending the object back to the server.
There are plenty of other places in the app where this sort of operation happens, but doesn't cause the unwanted side-effects. It probably boils down to the complexity of the client-side code associated with the Close object, or rather, my implementation of it.
I've spent so much time pouring over the official Hibernate docs, looking for possibly related problems, etc, I thought maybe it was a good time to ask for help. I have to man up and keep at it, but maybe just asking will help me figure it out.
I'm not sure what else to provide right now that's relevant. Hopefully what is here so far has relevance!
Thanks for listening!
Edit
May 5 12:18:38 localhost jsvc.exec[10117]: Hibernate: insert into example_dev.RECENT_ITEM (OBJECT_TYPE, OBJECT_ID, DATE, USER_ID) values (?, ?, ?, ?)
May 5 12:18:38 localhost jsvc.exec[10117]: Hibernate: delete from example_dev.PERMISSION_GROUP where PERMISSION_ID=?
May 5 12:18:38 localhost last message repeated 19 times
May 5 12:18:38 localhost jsvc.exec[10117]: Hibernate: delete from example_dev.USER_GROUP where USER_ID=?
May 5 12:18:38 localhost jsvc.exec[10117]: Hibernate: delete from example_dev.USER_DESIGNATION where USER_ID=?
It appears that deletes are happening right after this insert.. previous operations are all selects. But nothing in User should be cascading from RecentItem.

After a lot of research, I came to some conclusions and was able to take action. First off, I learned after quite a bit of searching on the Gilead forum, that it is no longer being actively maintained. Should have noticed that sooner. Meanwhile, I had started reading about RequestFactory, and after a couple days of research, decided I should try migrating to this.
This is a rather large project, roughly 50 domain objects, some with many object associations. It took me around 40-50 hours to rewrite everything from using Gilead + GWT RPC to using RequestFactory exclusively. I am pretty happy with the resulting changes in code and structure. I'm not too bothered thus far by having to create DTO proxy objects, and I took the opportunity to switch to Hibernate Annotations, getting rid of the mapping files.
It was tricky at times to refactor the code to utilize the fetch/edit/save cycles that RequestFactory requires. It did give me the opportunity to improve upon some of the code.
The good news is, the problem has been resolved. No more many-to-many associations being mysteriously deleted. My best guess is I was hitting a bug in Gilead, or my usage of it was incorrect, or it is possible I resolved a problem when migrating to Annotations.
I ran across some great resources while learning RequestFactory+Hibernate, many through StackOverflow (thanks again!):
Using GWT RequestFactory with Objectify - was great to get a feel for how RequestFactory interacted with the backend, and some methods and boilerplate to cut down on code.
More links below hopefully.. I'm still a noob so I am limited in # of hyperlinks I can post :)
I learned a lot, and am getting fluent in RequestFactory. I'll do my best to keep an eye out and help out here where I think I can.
Thanks StackOverflow!

Hibernate/Spring: getHibernateTemplate().save(...) Freezes/Hangs

I'm using Hibernate and Spring with the DAO pattern (all Hibernate dependencies in a *DAO.java class). I have nine unit tests (JUnit) which create some business objects, save them, and perform operations on them; the objects are in a hash (so I'm reusing the same objects all the time).
My JUnit setup method calls my DAO.deleteAllObjects() method which calls getSession().createSQLQuery("DELETE FROM <tablename>").executeUpdate() for my business object table (just one).
One of my unit tests (#8/9) freezes. I presumed it was a database deadlock, because the Hibernate log file shows my delete statement last. However, debugging showed that it's simply HibernateTemplate.save(someObject) that's freezing. (Eclipse shows that it's freezing on HibernateTemplate.save(Object), line 694.)
Also interesting to note is that running this test by itself (not in the suite of 9 tests) doesn't cause any problems.
How on earth do I troubleshoot and fix this?
Also, I'm using #Entity annotations, if that matters.
Edit: I removed reuse of my business objects (use unique objects in every method) -- didn't make a difference (still freezes).
Edit: This started trickling into other tests, too (can't run more than one test class without getting something freezing)
Edit: Breaking the freezing tests into two classes works. I'm going to do that for now, as shamefully un-DRY as it is to have two or more test classes unit-testing the same one business object class.
Transaction configuration:
<bean id="txManager"
class="org.springframework.jdbc.datasource.DataSourceTransactionManager">
<property name="dataSource" ref="dataSource" />
</bean>
<tx:advice id="txAdvice" transaction-manager="txManager">
<!-- the transactional semantics... -->
<tx:attributes>
<!-- all methods starting with 'get' are read-only -->
<tx:method name="get*" read-only="true" />
<tx:method name="find*" read-only="true" />
<!-- other methods use the default transaction settings (see below) -->
<tx:method name="*" />
</tx:attributes>
</tx:advice>
<!-- my bean which is exhibiting the hanging behavior -->
<aop:config>
<aop:pointcut id="beanNameHere"
expression="execution(* com.blah.blah.IMyDAO.*(..))" />
<aop:advisor advice-ref="txAdvice" pointcut-ref="beanNameHere" />
</aop:config>

When the freeze happens break the application, find the main thread and capture the stacktrace. Poke through until you find exactly what DB query is running that is blocking in the DB.
You mention running the test on it's own works ok but running the full suite causes a problem. If this is the case then I would guess that one of the prior tests still has a transaction open and has locks on some rows which the blocking test is trying to access.
Do your tests run concurrently? If so stop doing that as they could interfere with each other.
Turn on hibernate.show_sql option so you can see in the console all the SQL being generated.
At the point the freeze happens can you find out which rows are locked in the DB. e.g. in SQLServer you can run sp_lock to see this and sp_who to see which SQL process ids are blocking on another.

A few things to check:
proper transaction management - it appears that in your configuration you have transactions over a DAO of yours. Generally it is advisable to have transactions around your service layer, and not the DAO. But anyway - make sure you have a transaction around the dao in use by the test. Or make the test #Transactional (if using spring's junit runner)
change the logging treshold to info for the datasource (c3p0, perhaps?). It reports deadlocks.
watch the database logs for deadlocks (if there is such option)

Code vs. configuration for game object library

I'm working on a small online game where there is a need to store a reasonable amount of information about many (100+) different kinds of game objects.
I'm trying to decide whether to have this data generated by code or stored in some configuration file.
Data generation approach would be something like (in java-ish pseudo code):
(within a set of functions executed once at program startup)
....
// create grass terrain
grass=new GameObject();
grass.inheritProperties(generic_terrain);
grass.set(NAME,grass);
grass.set(MOVEABLE,true);
grass.set(MOVECOST,10);
grass.set(IMAGE_INDEX,1);
....
Whereas the config file approach would probably just use an XML-type format e.g.
(within terrain.xml file)
....
<terrain name="grass">
<inherit class="generic_terrain"/>
<property key="NAME" value="grass"/>
<property key="MOVABLE" value="true"/>
<property key="MOVECOST" value="10"/>
<property key="IMAGE_INDEX" value="1"/>
</terrain>
....
Some important points:
This information is static each time
the game game is run (i.e. does not
change during execution)
The property names (NAME, MOVECOST etc.) are a relatively small list but additional ones could be added over time
It is safe
to assume that it will only get
changed by the development team (i.e.
there is not a need for configuration
to be managed outside the build
process).
It will need to be tweaked
quite regularly during development
for game balancing reasons (e.g. making units less/more powerful)
There is a certain amount of "inheritance" of properties, i.e. in the example above grass needs to have all the standard properties defined by generic_terrain plus a few new additions/changes.
Which approach would be best for this situation? Any more importantly why?

Personally I like to push as much to config as possible. The reason for this is that at some point I may want to reuse the code I wrote for the game in a completely different way. If the source code is littered with references to implementation specific details this becomes much harder.
One interesting caveat to the config approach comes up when you want to start describing the behaviors of objects in addition to their values. Consider a simple example where you have a cup object which needs to "catch" ball objects. You might express them like:
<object name="ball">
<property key="shape" value="circle"/>
<property key="movable" value="true"/>
<property key="speed" value="10"/>
</object>
<object name="cup">
<property key="shape" value="rectangle"/>
<property key="movable" value="true"/>
<property key="speed" value="6"/>
<property key="catches" value="ball"/>
</object>
The problem here is that somewhere you still have to define what "catches" does inside your code. If you are using an interpreted language you could do something like:
<object name="cup">
<property key="shape" value="rectangle"/>
<property key="movable" value="true"/>
<property key="speed" value="6"/>
<oncollision>
if (collided.getName() == "ball") {
collided.destroy();
points++;
}
</oncollision>
</object>
Now you have gained the ability to describe how an object behaves as well as what it is. The only problem here is that if you are not working in a interpreted language you do not have the luxury of defining code at run time. This is one of the reasons Lua has become so popular in game development. It works well as both a declarative and procedural language and it is easy to embed in a compiled application. So you may express this situation like:
object {
name='ball';
movable=true;
speed=10;
}
object {
name='cup';
movable=true;
speed=6;
oncollision=function(collided)
if collided:getName() == "ball" then
collided:destroy();
points++;
end
end;
}

Separating data from code is just about ALWAYS a good idea. Even if the data is static during execution, it isn't during the design process. A game with hard-coded data is much less flexible than one which gathers its data from an easily-modifiable config file.
Keeping the data in separate files, xml for example, allows for quick and simple altering of the various values, which you say is important.

This isn't language-agnostic.
If you're using a compiled language, use a configuration file so that you don't force a recompile every time you tweak something.
If you're using a language where you don't have to perform an explicit compile/link process, do it in code so that you don't have to deal with parsing and loading. (But do it in one place so that it's easy to completely swap out functionality, should you need to do so at some point in the future).
The basic philosophy here is that code is data, but sometimes code-as-data is painfully difficult to modify; in such cases (e.g., the compiled-language case), write it in a kind of code that's easier to modify. (Your interpreted configuration language.)

A lot of game engines use scripting languages for many of the reasons you mentioned. Lua is a really great, fairly fast, lightweight scripting engine that has been used to great success in a lot of games. You can easily use the parser to do simple config setting and leave it at that, or build in more functionality and let actual code be written in the file. Your example in lua might look something like:
grass = {
NAME = "grass",
MOVEABLE = true,
MOVECOST = 10,
IMAGE_INDEX = 1
}
setmetatable(grass, generic_terrain)

If you do choose to use XML, at least try to use a sane schema. There's no reason to embed your own little schema ("object", "property", "key", "value") into XML when it's designed to represent just that stuff directly. How about:
<ball>
<shape>circle</shape>
<movable>true</movable>
<speed>10</speed>
</ball>
<cup>
<shape>rectangle</shape>
<movable>true</movable>
<speed>6</speed>
<catches>ball</catches/>
</cup>

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008