Sorry, we didn't find any relevant articles for you.

Send us your queries using the form below and we will get back to you with a solution.

How do I use the ParquetPartitionFileMergeReceiver?

The ParquetPartitionFileMergeReceiver merges Parquet partition data from many Parquet files into one Parquet file. Parquet partitioned data is a column-oriented data storage format (e.g. RCFile, ORC) that does not require nesting during the merge process.

Partition data generated by GenRocket Partition Receivers are expected to follow a common directory structure of: /home/user/outputPath/outputSubDir/serverN/instanceN/dataFileNameN

Receiver Parameters

The following parameters can be defined for the ParquetPartitionFileMergeReceiver. Items with an asterisk (*) are required. 

  • outputPath* - Defines the base path where data files to be merged are stored. 
  • outputSubDir - Defines an optional subdirectory, under the outputPath, where data files to be merged are stored. 
  • mergeSubDir - Defines a subdirectory, under the outputPath, where the merged file is to be stored. 
  • mergeFileName* - Defines the name of the file that will store the merged data. 
  • dataFileName* - Defines the name of the data files that were generated under the standard partition directory structure:
    /home/user
    /outputPath/outputSud/
    /serverN/instanceN/dataFileNameN

Receiver Attribute Property Keys

There are no property keys necessary for this Receiver.