1) The InputFormat itself is defined at Runtime.
2) The InputFormat class provides a iterator-like API:
- nextKeyValue (boolean)
3) The InputFormat class also provides the RecordReader and Splits to the higher level MapReduce framework, which creates Mappers and sends individual records to mappers.
The most common InputFormat is your FileInputFormat, which provides a series of InputSplits which, collectively, represent a whole file.
So - what if you want to generate input on the fly?
In this case, we can create our own, custom input format, which continues returning key value pairs. The "amount" of pairs returned can be acquired from a configuration parameter if we want to.
Here's an example:
Loading ....
No comments:
Post a Comment