Hadoop Pig complete statements

Apache Pig documents problem

I have some job with Apache Pig and i found that itseft document is very mess, this document is too long and user cann't have a good view to understand only basic thing: What can i do with this project?

What is Apache Pig

Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.

Pig statements


LOAD operator and the load/store functions to read data into Pig (PigStorage is the default load function)

Work with data

LIMIT operator to limits the number of output tuples
DISTINCT operator to removes duplicate tuples in a relation
FILTER operator to work with tuples or rows of data. Selects tuples from a relation based on some condition
FOREACH operator to work with columns of data. Generates data transformations based on columns of data
GENERATE operator to generate new column in a relation
GROUP operator to group data in one or more relations
COGROUP, inner JOIN, and outer JOIN operators to group or join data in two or more relations
CROSS operator to computes the cross product of two or more relations
FLATTEN operator to un-nests tuples as well as bags
UNION operator to merge the contents of two or more relations
CUBE operator to performs cube/rollup operations
SPLIT operator to partition the contents of a relation into multiple relations

Storing final result

STORE operator and the load/store functions to write results to the file system (PigStorage is the default store function)

Debugging Pig Latin

DUMP operator to display results to your terminal screen
DESCRIBE operator to review the schema of a relation
EXPLAIN operator to view the logical, physical, or map reduce execution plans to compute a relation
ILLUSTRATE operator to view the step-by-step execution of a series of statements
SAMPLE operator to selects a random sample of data based on the specified sample size