jayunit100: How to read non Writable values (not just keys) in Hadoop SequenceFile.Reader classes .

Special thanks to the DataSalt folks for this one.

This is a very, very specific post. Only those confused about hadoop, java, and custom serialization will find it useful. Actually, I think only programmers who are working on java development at either DataSalt or Peerindex will find it useful... But I guess that will be good for my Peerindex score because its really specific. So thats cool.

Anyways...

So , I've been trying to read in Key/Value pairs, from thrift , that are non-writable in hadoop.

Oddly, when using ThriftSerialization settings, you can easily use a SequenceFile.Reader to do the following :

Object myThriftBean = MyThriftClass.newInstance();
myReader.next(myThriftBean);

However, this is only because the SequenceFile.Reader class supports a

next(Object o); method. This method reads in only a key.

I'm sure we would all expect that there MUST be a corresponding method that reads in both the key and value of a given entry in a sequence file... right ? WRONG !

ODDLY : SequenceFile.Reader does NOT have a

next(Object k, Object v); method !

So - what if you want to read both key and value pairs of a SequenceFile ?

You can do the following :

Object k = myKeyClass.newInstance();
Object v = myValueClass.newInstance();

while(reader.next(k))
{
System.out.println(k);//the key has been read in already...
System.out.println(reader.getCurrentValue(v)); //now ,read the vlue before reading the next key.
}

Unfortunately, the SequenceFile.Reader documentation sais that the

next(Object o)

method "skips" a value, reading only a key. However, this is not the case, it appears that, until we read the NEXT key, the reader has access to the value in the file you are looking at.

22.12.11

How to read non Writable values (not just keys) in Hadoop SequenceFile.Reader classes .

No comments:

Post a Comment