Special thanks to the DataSalt folks for this one.
This is a very, very specific post. Only those confused about hadoop, java, and custom serialization will find it useful. Actually, I think only programmers who are working on java development at either DataSalt or Peerindex will find it useful... But I guess that will be good for my Peerindex score because its really specific. So thats cool.
Anyways...
So , I've been trying to read in Key/Value pairs, from thrift , that are non-writable in hadoop.
Oddly, when using ThriftSerialization settings, you can easily use a SequenceFile.Reader to do the following :
Object myThriftBean = MyThriftClass.newInstance();
myReader.next(myThriftBean);
However, this is only because the SequenceFile.Reader class supports a
next(Object o); method. This method reads in only a key.
I'm sure we would all expect that there MUST be a corresponding method that reads in both the key and value of a given entry in a sequence file... right ? WRONG !
ODDLY : SequenceFile.Reader does NOT have a
next(Object k, Object v); method !
So - what if you want to read both key and value pairs of a SequenceFile ?
You can do the following :
Object k = myKeyClass.newInstance();
Object v = myValueClass.newInstance();
while(reader.next(k))
{
System.out.println(k);//the key has been read in already...
System.out.println(reader.getCurrentValue(v)); //now ,read the vlue before reading the next key.
}
Unfortunately, the SequenceFile.Reader documentation sais that the
next(Object o)
method "skips" a value, reading only a key. However, this is not the case, it appears that, until we read the NEXT key, the reader has access to the value in the file you are looking at.
No comments:
Post a Comment