16.12.13

Pig: If using JsonStorage, beware of untyped data.

Add typing to your fields from the start if your planning on exporting them later using pig's JsonStorage function... At least until PIG-3627 is addressed.

In other words:

When using JsonStorage, your schema should look like this: 
{f1: chararray,count: long}
 

NOT like this:

{f1: NULL,count: long}


The details:

The example of this is below (in green is the change i had to make in this pig script to make JsonStorage export the feilds without exploding on the NULL schema type element).
  pigServer.registerQuery(
                "id_details = FOREACH csvdata GENERATE " +
                        "FLATTEN" +
                            "(STRSPLIT" +
                                "(ID,',',3)) AS (drop, code, transaction) ," +
                        "FLATTEN" +
                            "(STRSPLIT" +
                                /**
                                 * Schema has to be defined here
                                 * for any feilds which are going to export as json!
                                 */

                                "(DETAILS,',',5)) AS (lname, fname, date, price, product:chararray);");

       
pigServer.registerQuery(
                "transactions = FOREACH id_details GENERATE $0 .. ;");
       
        pigServer.registerQuery(
                "transactionsG = group transactions by code;");
       
        pigServer.registerQuery(
                "uniqcnt  = foreach transactionsG {"+
                               "sym = transactions.product ;"+
                               "dsym =  distinct sym ;"+
                               "generate flatten(dsym.product) as f1:chararray, COUNT(dsym) as count ;" +
                               "};");
         //error happens here !
        pigServer.store("uniqcnt", "/tmp/bbb"+System.currentTimeMillis(), "JsonStorage");

ERROR 1031: Incompatable field schema: declared is "f1:chararray", infered is "null::product:NULL"

So the moral of the story : If using JsonStorage, start out using strong types,  to save yourself the hassle. 

I just filed a JIRA for this https://issues.apache.org/jira/browse/PIG-3627, and we will see how it evolves over time :).

FYI thanks alot to https://issues.apache.org/jira/secure/ViewProfile.jspa?name=cheolsoo for helping me with this !



No comments:

Post a Comment