GeoWave Quickstart Guide Vector Demo geowave-icon-logo-cropped

In the Vector Demo, we use GeoWave to ingest and run a Kernel Density Estimation on a large set of media/broadcast data provided by the GDELT Project.

Set-Up Environment Variables

Download the GeoWave environment script. We will also download two .sld files to use later on in the guide.

SandBox:

cd /mnt
sudo wget s3.amazonaws.com/geowave/latest/scripts/sandbox/quickstart/geowave-env.sh
sudo wget s3.amazonaws.com/geowave/latest/scripts/emr/quickstart/KDEColorMap.sld
sudo wget s3.amazonaws.com/geowave/latest/scripts/emr/quickstart/SubsamplePoints.sld

EMR:

cd /mnt
sudo wget s3.amazonaws.com/geowave/latest/scripts/emr/quickstart/geowave-env.sh
sudo wget s3.amazonaws.com/geowave/latest/scripts/emr/quickstart/KDEColorMap.sld
sudo wget s3.amazonaws.com/geowave/latest/scripts/emr/quickstart/SubsamplePoints.sld

This script defines a number of the variables that will be used in future commands, so we will source it here.

source /mnt/geowave-env.sh

Download GDELT Data

We will be using data from the GDELT Project in this guide. For more information about the GDELT Project please visit their website here.

Download whatever gdelt data matches $TIME_REGEX. The example is set to 201602 by sourcing the geowave-env.sh script. Please make sure that you have sourced the environment script before calling this command.

sudo mkdir $STAGING_DIR/gdelt;cd $STAGING_DIR/gdelt
sudo wget http://data.gdeltproject.org/events/md5sums
for file in `cat md5sums | cut -d' ' -f3 | grep "^${TIME_REGEX}"` ; \
do sudo wget http://data.gdeltproject.org/events/$file ; done
md5sum -c md5sums 2>&1 | grep "^${TIME_REGEX}"
cd $STAGING_DIR

After the data has been downloaded, we are ready to set up the store and index used to ingest the data.

Config and Ingest

  1. Add a GeoWave store

    1. If using Sandbox

      geowave config addstore gdelt --gwNamespace geowave.gdelt -t hbase --zookeeper \
      sandbox.hortonworks.com:2181
    2. If using EMR Accumulo

      geowave config addstore gdelt --gwNamespace geowave.gdelt \
      -t accumulo --zookeeper $HOSTNAME:2181 --instance accumulo \
      --user geowave --password geowave
    3. If using EMR HBase

      geowave config addstore gdelt --gwNamespace geowave.gdelt \
      -t hbase --zookeeper $HOSTNAME:2181
  2. Add a spatial index

    geowave config addindex -t spatial gdelt-spatial --partitionStrategy round_robin \
    --numPartitions $NUM_PARTITIONS
  3. Ingest the data into geowave

    1. If using Sandbox

      geowave ingest localtogw /mnt/gdelt gdelt gdelt-spatial -f gdelt --gdelt.cql "INTERSECTS(geometry,$GERMANY)"
    2. If using EMR

      geowave ingest localtogw $STAGING_DIR/gdelt gdelt gdelt-spatial -f gdelt \
      --gdelt.cql "BBOX(geometry,${WEST},${SOUTH},${EAST},${NORTH})"

The ingest should take about ~3-5 minutes. Once the ingest has started, you can monitor HBase status at the HBase web interface, or the Accumulo status at the Accumulo web interface. The ingest is complete when your terminal will accept user input.

Kernel Density Estimation (KDE)

Once the ingest has completed:

  1. Add another store for the kde.

    1. If using Sandbox

      geowave config addstore gdelt-kde --gwNamespace geowave.kde_gdelt \
      -t hbase --zookeeper $HOSTNAME:2181
    2. If using EMR Accumulo

      geowave config addstore gdelt-kde --gwNamespace geowave.kde_gdelt \
      -t accumulo --zookeeper $HOSTNAME:2181 --instance accumulo --user geowave --password geowave
    3. If using EMR HBase

      geowave config addstore gdelt-kde --gwNamespace geowave.kde_gdelt \
      -t hbase --zookeeper $HOSTNAME:2181
  2. Run the KDE analytic

    1. If using Sandbox

      geowave analytic kde --featureType gdeltevent --minLevel 5 --maxLevel 26 \
      --minSplits $NUM_PARTITIONS --maxSplits $NUM_PARTITIONS --coverageName gdeltevent_kde  \
      --hdfsHostPort sandbox.hortonworks.com:${HDFS_PORT} \
      --jobSubmissionHostPort sandbox.hortonworks.com:${RESOURCE_MAN_PORT} \
      --tileSize 1 gdelt gdelt-kde
    2. If using EMR

      geowave analytic kde --featureType gdeltevent --minLevel 5 \
      --maxLevel 26 --minSplits $NUM_PARTITIONS --maxSplits $NUM_PARTITIONS \
      --coverageName gdeltevent_kde --hdfsHostPort ${HOSTNAME}:${HDFS_PORT} \
      --jobSubmissionHostPort ${HOSTNAME}:${RESOURCE_MAN_PORT} --tileSize 1 gdelt gdelt-kde

The KDE can take 5-10 minutes to complete due to the size of the dataset. Once it starts, its progress will be displayed in the terminal. The HBase status can be monitored through the HBase web interface, or the Accumulo status at the Accumulo web interface.

Once the KDE has run its course successfully, you should be able to view the heatmap generated by it, as well as a map of all of the ingested data points. If you would like to do this before completing the Raster Demo, proceed to Integrate with Geoserver and then to the Interacting with the Cluster section. You will still be able to view the results for both demos after completing the Raster Demo.

Raster Demo

GeoServer Integration

Interacting with the cluster