22.10.13

inotify-tools: Here's what YARN does while your getting coffee.



As aweseome as HDFS is, the beauty of a FUSE mounted FileSystem is the fact that you can monitor everything using standard *inx utils (well - as long as they are happening locally at least, thanks martin for pointing that out - see comments below)...   

Anyways... We normally FUSE mount gluster, so when we run hadoop on top of gluster, its easy to watch whats going on behind the scenes by using posix utilities.... case inpoint: inotifywait .

Today while debugging some YARN operations, i got a chance to try out inotify.  Inotify is a simple real time, recursive file monitoring tool.

FYI, this techniuqe is not specific to hadoop and gluster - im just using those as examples.  You could just the same use inotify-tools to monitor any other file operations.  For example:

Other possible awesome uses of the inotify utilities:

- monitoring static files served by a web server, for example, to confirm that the same file's/directories were'nt being read too often or poorly cached from a shared storage pool.

- monitoring the amount of file ops occuring in the data/ directories of a RDBMS to confirm that too much disk i/o wasn't occuring.

- etc etc etc...

How YARN, the FileSystem, and FUSE intersect in a gluster deployment.
(you can skip this if all you care about is installation and running of inotify-tools)...

For those wondering what I mean by "file system"... Hadoop was born as two projects : A file system and a mapreduce framework.  The file system, HDFS, provided an API interface, which anyone could implement using any particular filesystem, so that different file systems can be used underneath mapreduce.

Fast forward a few years, and now YARN comes along, which further decoupled mapreduce into a resource allocator and the mapreduce application.

In any case, YARN needs a FileSystem implementation for some of its distributed data (i.e. the distributed staging/ directory).

In this particular scenario, I was attempting to trace some operations occuring on the file system.  Rather than having to do java specific hacks or log inspection, since the gluster implementation of the hadoop file system is mounted over FUSE (see https://forge.gluster.org/hadoop/glusterfs-hadoop for details), we can simply run standard *nix file monitoring utilities to see what java FileSystem operations YARN is doing under the hood on startup.

Caveat:  You want to run these operations on multiple nodes  - because the file ops will only be seen for operations that are happening locally.  TL;DR ~ run this on your YARN master node, so that you can see everything YARN is doing on startup.  OR ELSE, run it on every node.


So, anyways, here's how to install inotify-tools and run it recursively against a folder:

1) First install inotify from the EPELs. 
 
#> rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm

#> yum install inotify-tools 
 
2) Watch YARN do its 'thang
 


#> inotifywait -r -m /mnt/glusterfs/
/mnt/glusterfs/ CREATE,ISDIR tmp
/mnt/glusterfs/ OPEN,ISDIR tmp
/mnt/glusterfs/ CLOSE_NOWRITE,CLOSE,ISDIR tmp
/mnt/glusterfs/tmp/ CREATE,ISDIR hadoop-yarn
/mnt/glusterfs/tmp/ OPEN,ISDIR hadoop-yarn
/mnt/glusterfs/tmp/ CLOSE_NOWRITE,CLOSE,ISDIR hadoop-yarn
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ ATTRIB,ISDIR done
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done/ ATTRIB,ISDIR 
/mnt/glusterfs/lib/ ACCESS glusterfs-hadoop.jar
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CREATE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ ATTRIB,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ ATTRIB,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ ATTRIB,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ ATTRIB,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done/ OPEN,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done/ CLOSE_NOWRITE,CLOSE,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR 
/mnt/glusterfs/lib/ OPEN glusterfs-hadoop.jar
/mnt/glusterfs/lib/ ACCESS glusterfs-hadoop.jar
/mnt/glusterfs/lib/ ACCESS glusterfs-hadoop.jar
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR 
/mnt/glusterfs/lib/ ACCESS glusterfs-hadoop.jar
/mnt/glusterfs/lib/ ACCESS glusterfs-hadoop.jar
/mnt/glusterfs/lib/ ACCESS glusterfs-hadoop.jar
/mnt/glusterfs/lib/ ACCESS glusterfs-hadoop.jar
/mnt/glusterfs/lib/ ACCESS glusterfs-hadoop.jar
/mnt/glusterfs/lib/ ACCESS glusterfs-hadoop.jar
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done/ OPEN,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done/ CLOSE_NOWRITE,CLOSE,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ OPEN,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ OPEN,ISDIR 
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/ CLOSE_NOWRITE,CLOSE,ISDIR done_intermediate
/mnt/glusterfs/tmp/hadoop-yarn/staging/history/done_intermediate/ CLOSE_NOWRITE,CLOSE,ISDIR 

2 comments:

  1. I'm not sure it this will work for real glusterfs setup (eg. trying to get notification about change being done on other machines from the cluster), since implementing inotify is tricky for any FUSE based or distributed filesystem, and glusterfs is both. See BZ 812342. My guess is that this will never be implemented.

    Inotify is a feature implemented in kernel VFS, so it may seem to work for any filesystem, if the filesystem operation you are interested in is local.

    Martin B.

    ReplyDelete
  2. Thanks martin : Your right ! I-notify shows locally originated file operations that write to gluster. So you will want to run it on the master node in this example, or else, every other node. I've added that caveat in ........... hot-pink .........so that its clear to people trying this out .

    ReplyDelete