6.1.14

Using jar -tf + mvn dependency:tree to escape jar-hell



mvn dependency:tree + jar -tf are your friends when tracking down redundant classes with subtle differences that break your code at runtime.
NOTE THAT IF YOU RUN INTO ISSUES LIKE THIS , IT IS LIKELY BECAUSE YOU HAVENT ENCAPSULATED YOUR RUNTIME PROPERLY.  In a hadoop cluster, you shouldnt expect crunch, pig, hive, ... to all run in the same classpath.  For example:  I ran into this in an eclipse project where I was attempting to add all the ecosystem tool APIs into the same module.  In retrospect, even though it was fixable, I really shouldn't have expected this to work in the first place.  Normal hadoop deployments, after all, have different lib/ directories for different tools - sort of in the OSGi sense, and that is the way also that eclipse packages its plugins - thus - java dependencies are better of being namespaced and isolated, rather than painstakingly resolved and integrated.

Nevertheless, its worth the effort to try and debug something like this, at least once, just so you know how to do it ...

... So here goes...
 
So, I recently hit a a cryptic "init()<V> not found" exception when running crunch in a maven project with Avro serialization.   This is by no means "avro" or "hadoop" or "crunch" specific:  Libraries like avro, which are used in alot of places, can be very dangerous shifting sands in your runtime environment.  Other culprits for jar hell are things like utilities (commons, guava, guice, logging frameworks, etc...). But regardless of where your jar-hell comes from, it always looks a little somethin like this:
java.lang.NoSuchMethodError: org.apache.avro.mapred.AvroKey: method <init>()V not found
    at org.apache.crunch.types.avro.AvroKeyConverter.getWrapper(AvroKeyConverter.java:57)
    at org.apache.crunch.types.avro.AvroKeyConverter.outputKey(AvroKeyConverter.java:36)
    at org.apache.crunch.types.avro.AvroKeyConverter.outputKey(AvroKeyConverter.java:25)
    at org.apache.crunch.impl.mr.emit.MultipleOutputEmitter.emit(MultipleOutputEmitter.java:41)
    at org.apache.crunch.MapFn.process(MapFn.java:34)
    at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:99)
    at org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:56)
    at org.apache.crunch.MapFn.process(MapFn.java:34)
    at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:99)
    at org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:56)
    at org.apache.crunch.MapFn.process(MapFn.java:34)
    at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:99)
    at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:110)
    at org.apache.crunch.impl.mr.run.CrunchMapper.map(CrunchMapper.java:60)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

In my case, older avro versions didn't have an empty argument contructor which crunch depends on.  It was very frightening at first site.... But after scanning the apache/crunch github repo, and checking both 1.3 and 1.5 avro branches, it became obvious that a change had occured to newer avro libraries which added a default empty constructor to the AvroKey class.

Now, I'll do my best for the rest of this post not to dive into the details of my specific problem - as jar hell is a very generic problem and (i think) the below strategy is a pretty generic strategy for solving any kind of jar hell problem.

How to fix it: high level.

The tricky part of all this is that there really is no way of easily knowing, at runtime, what jar is being used to dial up a particular class: the stack trace will always be the same.  Thus, you have to

- scan all JARS on your classpath for the redundant class using "jar -tf | grep <CLASS_NAME>.class".

- decide which jar you DONT want (usually this can be done by mapping the jar version to the code).

- find the offending parent pom dependency is pulling in the unwanted jar using "mvn dependency:tree"

- exclude it using <exclude> tags.

Detailed example:

scan all JARS

First, cd to where your jars live:

1)  cd to M2_HOME (this is normally $HOME/.m2 if you dont set it explicitly.

2) for f in `find . -name '*avro*.jar'` ; do echo "checking $f" && jar -tf $f | grep "AvroKey.class" ; done

          decide which jar you DONT want 
It should be obvious now which "culprit" jar you are looking at, for example, in my case:

jays-MacBook-Pro:.m2 jayunit100$ for f in `find . -name '*avro*.jar'` ; do echo "checking $f" && jar -tf $f | grep "AvroKey.class" ; done
checking ./repository/com/twitter/parquet-avro/1.2.0/parquet-avro-1.2.0-sources.jar
checking ./repository/com/twitter/parquet-avro/1.2.0/parquet-avro-1.2.0.jar
checking ./repository/org/apache/avro/avro/1.7.4/avro-1.7.4.jar
checking ./repository/org/apache/avro/avro-mapred/1.7.4/avro-mapred-1.7.4.jar
org/apache/avro/mapred/AvroKey.class

checking ./repository/org/apache/avro/trevni-avro/1.7.4/trevni-avro-1.7.4.jar
checking ./repository/org/apache/cassandra/deps/avro/1.4.0-cassandra-1/avro-1.4.0-cassandra-1-sources.jar
checking ./repository/org/apache/cassandra/deps/avro/1.4.0-cassandra-1/avro-1.4.0-cassandra-1.jar
org/apache/avro/mapred/AvroKey.class

Obviously, in my case what was happening is that somehow cassandra was packaging a version of avro up which had an old version of AvroKey.class,...


 3)  find the offending parent pom dependency
The next step is to find out WHERE maven is pulling that jar in.  To do that, you can use mvn dependency:tree.  This will show you exactly what top level pom dependency is responsible for bundling the naughty jar file.
[INFO] +- org.apache.mahout:mahout-examples:jar:0.8:compile
[INFO] |  +- org.apache.mahout:mahout-integration:jar:0.8:compile
[INFO] |  |  +- org.apache.solr:solr-commons-csv:jar:3.5.0:compile
[INFO] |  |  +- org.mongodb:mongo-java-driver:jar:2.11.1:compile
[INFO] |  |  +- org.mongodb:bson:jar:2.11.1:compile
[INFO] |  |  +- org.apache.cassandra:cassandra-all:jar:1.2.5:compile
[INFO] |  |  |  +- net.jpountz.lz4:lz4:jar:1.1.0:compile
[INFO] |  |  |  +- com.ning:compress-lzf:jar:0.8.4:compile
[INFO] |  |  |  +- com.googlecode.concurrentlinkedhashmap:concurrentlinkedhashmap-lru:jar:1.3:compile
[INFO] |  |  |  +- org.apache.cassandra.deps:avro:jar:1.4.0-cassandra-1:compile

Aha !  So its mahout's fault :) Now we can exclude it  

The final step now is to enable an exclusion filter which removes the above jar from the transitive dependencies which are pulled in via maven.  In my case, I chose to exclude "cassandra-all" since I don't want mahout to willy-nilly bundle anything cassandra related (i'd rather just have that in my top level pom to begin with).

<dependency>
            <groupId>org.apache.mahout</groupId>
            <artifactId>mahout-core</artifactId>
            <version>0.8</version>
            <exclusions>
                <!--
                    cassandra bundles an old avro, which conflicts with crunch's need
                    for the new avro, see
                    http://stackoverflow.com/questions/20951839/how-to-trace-the-origin-of-initv-failures-in-avro
                 -->               
                <exclusion>
                    <artifactId>cassandra-all</artifactId>
                    <groupId>org.apache.cassandra</groupId>
                </exclusion>

            </exclusions>
        </dependency>
Fixed ! Now, when you build your project, you can expect that older jars referenced wont be transitively included into your classpath.  

Of course, this all assumes that you dont NEED references to the older jar.  In that case, the jar-hell problem is much more difficult to solve: probably you will need to run your apps in different JVMs, for example. 

1 comment:

  1. I was stuck trying to determine why I was getting 'Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected' errors for about 3 days, even though I was using the avro-mapred-xxx-hadoop2.jar, and found that the hive-0.13.jar I had in my environment was the cause of the issue (seems like it includes a version of some avro classes compiled against MR 1.x). Thanks for the blog post - was a great help.

    ReplyDelete