Gerrit analytics ETL job.

Extracts committers data using analytics plugin and insert in Elastic
Search.
Accept parameters
--since --until --aggregate like gerrit analytics plugin
Additionally understands
--url to access a gerrit repo
[--out] to specify an intermediate output folder (possibly for debug)
for generated RDD
[--elasticIndex] to specify index/type to use

Change-Id: I62440b3cb4f2f3f3b2346eda9f62d81188264d8e
13 files changed
tree: 5fc7c9a9af77a31790ef733f4e0b15c8143dd090
  1. project/
  2. src/
  3. .gitignore
  4. build.sbt
  5. LICENSE
  6. README.md
README.md

spark-gerrit-analytics-etl

Spark ETL to extra analytics data from Gerrit Projects.

Job can be launched with the following parameters:

bin/spark-submit \
    --conf spark.es.nodes=company.com \
    $JARS/SparkAnalytics-assembly-1.0.jar \
    --since 2000-06-01 \
    --aggregate email_hour \ 
    --url http://localhost:8080 \ 
    -e gerrit/analytics

Parameters

  • since, until, aggregate are the same defined in Gerrit Analytics plugin see: https://gerrit.googlesource.com/plugins/analytics/+/master/README.md
  • -u --url location/port of Gerrit server for extracting the analytics data
  • -e --elasticIndex specify as / to be loaded in Elastic Search if not provided no ES export will be performed
  • -o --out folder location for storing the output as JSON files if not provided data is saved to /analytics- where is the system temporary directory