commit	ace479793204d35591ab4d0fd6874f5611395b7c	[log] [tgz]
author	Claudio Pacchiega <claudio.pacchiega@gmail.com>	Mon Aug 21 17:15:29 2017 +0200
committer	Luca Milanesio <luca.milanesio@gmail.com>	Sat Aug 26 11:11:22 2017 -0700
tree	5fc7c9a9af77a31790ef733f4e0b15c8143dd090
parent	6f2828acd9fb14e82210d0bc296b37fc3161563f [diff]

Gerrit analytics ETL job. Extracts committers data using analytics plugin and insert in Elastic Search. Accept parameters --since --until --aggregate like gerrit analytics plugin Additionally understands --url to access a gerrit repo [--out] to specify an intermediate output folder (possibly for debug) for generated RDD [--elasticIndex] to specify index/type to use Change-Id: I62440b3cb4f2f3f3b2346eda9f62d81188264d8e

tree: 5fc7c9a9af77a31790ef733f4e0b15c8143dd090

README.md

spark-gerrit-analytics-etl

Spark ETL to extra analytics data from Gerrit Projects.

Job can be launched with the following parameters:

bin/spark-submit \
    --conf spark.es.nodes=company.com \
    $JARS/SparkAnalytics-assembly-1.0.jar \
    --since 2000-06-01 \
    --aggregate email_hour \ 
    --url http://localhost:8080 \ 
    -e gerrit/analytics

Parameters

since, until, aggregate are the same defined in Gerrit Analytics plugin see: https://gerrit.googlesource.com/plugins/analytics/+/master/README.md
-u --url location/port of Gerrit server for extracting the analytics data
-e --elasticIndex specify as / to be loaded in Elastic Search if not provided no ES export will be performed
-o --out folder location for storing the output as JSON files if not provided data is saved to /analytics- where is the system temporary directory