Bigtop packages the entire upstream hadoop ecosystem for us. It does this, in general, by building the same jars which come from hadoop distros : without patching them - and converting them to rpm/deb packages.
So.... that means the hadoop tarballs that you're used to will get split out into a linux friendly package structure, i.e. (/usr/lib/hadoop/, /etc/hadoop/conf, and so on ...).
What does this mean for Java developers? It means that, finally, after all these years, we'll have to learn something about how linux packages stuff. No longer is a maven repo enough, when it comes to your bigdata applications. Remember: to run hadoop, you need a system that is very well organized, which is uniform across the entire cluster, etc.... And for that - you really need 1st class packaging.
I'll update this over time as I learn more. First we'll start off with the two "main" components that drive the creation of an RPM from java sources... do-component-build and ther spec file.
Specifically, we're looking at hadoop here: But for simpler examples you can dive into the bigtop source and see how tools such as mahout are deployed.
The do-component-build file
BigTop builds jars directly from upstream project sources. No actual patching is done. The build artifacts of upstream sources are then decomposed into proper linux packages. At the heart of the packaging is the .spec file, of course. But the building of the raw artifacts for a BigTop package (i.e. the jar files, and so on) that get put into linux directories by the spec file is done in the do-component-build file. Each hadoop ecosystem component has one.
The hadoop.spec file
As mentioned above, actual packaging of raw apache artifacts into RPMs, which ultimately guide the way hadoop components get split up into linux directories (/etc/, /usr/lib, and so on) is done by spec files for each component. So, lets poke around in the source for the hadoop.spec file bigtop-packaging/src/common/rpm/SPECS/hadoop.spec.
%DEFINE (see also RPM Macros)
![]() |
Before anything, RPM specifications (like any program) define a bunch of constants. |
The %DEFINE directive in rpm declares macro expansions (fancy word for variables). So, for example, later on we will see references to "etc_yarn".
etc_yarn=/etc/yarnPreamble: Defining the package metadata
![]() |
The preamble defines the metadata and main components of the installation. |
After the macros, the preamble begins. Here we can see references to the %DEFINE directives above. For example "hadoop_name" (hadoop). Also there are several Source[0-n] definitions. We will see what the Source[0-n] definitions point to in a second.
Preamble :SOURCE[...]
![]() |
Each entry in "Source..." corresponds to a file present in the packaging source code which will get installed on the target installation. |
The SOURCE directives point to a variety of oddly suffixed files, i.e. "hadoop.1", "do-component-build", etc... For each one of these, we can see how they are applied/installed by looking up the corresponding call to their "SOURCE*" name. For example, a quick grep for"yarn.conf" references in the rpm specfile:
![]() |
The yarn.conf file ultimately is added tin /etc/security/limits.d/ in the target |
BuildRequires and Requires:
The build vs. installation of a program are completely different animals. In hadoop's case, there are build requirements, such as gcc... But we don't need gcc to actually run hadoop on a cluster. Conversly, we need sh-utils in a hadoop cluster, but we don't need them for compilation of hadoop. Thus, we have a few different OS specific dependency declarations such as this one:
install_hadoop.sh and the RPM_BUILD_ROOT
Finally, we have the installation command. As you know, hadoop consists of:
- configuration files (like core-site.xml)
- executables (like hadoop)
- logs (created by the nodemanagers, resourcemanager, etc..).
Now something interesting about RPM installers: The $RPM_BUILD_ROOT. The RPM_BUILDROOT directory is used as the prefix to many of the arguments to install_hadoop.sh, which is called in the below portion. For example: we can see that etc_hadoop is put into "RPM_BUILD_ROOT/". etc_hadoop=/etc/hadoop/, so RPM_BUILD_ROOT is thus
![]() |
SOURCE2=install_hadoop.sh, the command that installs hadoop. |
Next, we have a series of shell calls, disguised as RPM macros
See https://www.zarb.org/~jasonc/macros.php
Most of the macros used in this rpm are simply platform independent ways of referncing standard unix libraries... like "sed" , "ls" , "chgrp" and so on.
The functionality of the above commands are pretty obvious... They do what their unix equivalents already do.
%PRE directives
Next up, %pre directives. These define steps which precede installation of particular components. In general , you can see that this is where the hadoop service users names are created.
%FILES
At this point, we've defined metadata, user names, and other system specific info about our package. So what are we missing? files! The %Files directive is probably the most important: It tells you exactly what files are being installed, and where they should go. It supports globs/recursive installs as well.
THATS ALL FOR NOW ! I'll update this post once I learn more from our good friends at the ASF Bigtop. In the meantime, just ping the mailing list (dev@bigtop) for specific questions.
HI,
ReplyDeleteThanks for Great detailed information on bigtop RPM compilation. if possible can you share your spec file.
Check out the source code at https://github.com/apache/bigtop, under bigtop-packaging
ReplyDeleteThanks for InformationHadoop Course will provide the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action. This course will further examine related technologies such as Hive, Pig, and Apache Accumulo. HADOOP Online Training
ReplyDeleteThanks for your post! Bigtop Hadoop distribution artifacts won't make you feel that you live in an alien world! After installing, you will get a chance to blend a Hadoop cluster in any mode, with the sub-projects of it. Its all for you to garnish next! :) Hadoop Online Training
ReplyDeleteAwesome post that I regret that I didn't read it earlier. This is a fundamental guide for beginners to start packaging big data components in Bigtop!
ReplyDeleteThe Hadoop tutorial you have explained is most useful for begineers who are taking Hadoop Administrator Online Training for Installing Haddop on their Own
ReplyDeleteThank you for sharing Such a good tutorials on Hadoop
hadoop online training is becoming more popular in India as many students from Europe, South America and Australia are showing more interest in training institutes in India.
ReplyDeleteReally i enjoyed very much. And this may helpful for lot of peoples. So you are provided such a nice and great article within this.
ReplyDeleteWeb Designing Training in Chennai
Java Training in Chennai
Salesforce Training in Chennai
Thanks For Sharining..A good Information..This is a nice Blog Keep Sharining This Type Of Information..
ReplyDeleteHadoop Online Training In Hyderabad
We are spending most of our timing through the social medias especially facebook. Within that we usually share information to our closest persons. Apart from that we can learn new things or news through this only. So here the information is much valuable and using the trending hashtags was really informative. Thank you for sharing and please keep update like this.
ReplyDeleteMSBI Training in Chennai
Informatica Training in Chennai
Dataware Housing Training in Chennai
Excellent post on hadoop Technologies Please makes more post on this tech to make us update in this.
ReplyDeleteHadoop Training in Bangalore
Appreciative to you, for sharing those magnificent expressive affirmations. I'll endeavor to do around a motivating force in responding; there's a remarkable game-plan that you've squeezed in articulating the principal objectives, as you charmingly put it. Continue Sharing
ReplyDeleteBig Data Hadoop online training in Hyderabad
Online Hadoop training in Bangalore
The blog is so interactive and Informative , you should write more blogs like this Hadoop Administration Online course Hyderabad
ReplyDelete