Apache Pig documents problem
I have some job with Apache Pig and i found that itseft document is very mess, this document is too long and user cann't have a good view to understand only basic thing: What can i do with this project?What is Apache Pig
Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.
Pig statements
Load
LOAD operator and the load/store functions to read data into Pig (PigStorage is the default load function)Work with data
LIMIT operator to limits the number of output tuplesDISTINCT operator to removes duplicate tuples in a relation
FILTER operator to work with tuples or rows of data. Selects tuples from a relation based on some condition
FOREACH operator to work with columns of data. Generates data transformations based on columns of data
GENERATE operator to generate new column in a relation
GROUP operator to group data in one or more relations
COGROUP, inner JOIN, and outer JOIN operators to group or join data in two or more relations
CROSS operator to computes the cross product of two or more relations
FLATTEN operator to un-nests tuples as well as bags
UNION operator to merge the contents of two or more relations
CUBE operator to performs cube/rollup operations
SPLIT operator to partition the contents of a relation into multiple relations
Storing final result
STORE operator and the load/store functions to write results to the file system (PigStorage is the default store function)Debugging Pig Latin
DUMP operator to display results to your terminal screenDESCRIBE operator to review the schema of a relation
EXPLAIN operator to view the logical, physical, or map reduce execution plans to compute a relation
ILLUSTRATE operator to view the step-by-step execution of a series of statements
SAMPLE operator to selects a random sample of data based on the specified sample size
Hi,
ReplyDeleteThe best information and we are providing online training with all modules
hadoop online training with real time projects
Hi,
ReplyDeleteNice to share about hadoop big data.The best hadoop online trainers provides online training on hadoop with real time experienced experts
hadoop online training