Humpack whale migration routes - Interactive data visualization¶
Species description¶
My closest encounter with a humpback whale took place during a month-long kayak trip in Alaska when I was 16. The water was the color of iron or mercury under the day's clouds, and everything was silent in the way only a fog-shrouded ocean wilderness can be. Suddenly the beast broke the surface only a few tens of feet away, with an explosion of steam that took my breath away and set my heart pounding. Our 10 foot kayak was a scrap of plastic to this thing. It surfaced several more times, a few minutes apart, just long enough for the suspense to build - curious, and investigating our small pod. Then it dove, and surfaced again far away across the bay.
Humpback whales, /Megaptera novaeangliae/, are known to migrate vast distances across the ocean. Perhaps by looking at the routes they choose, we can understand something more about their preferences, and their ocean habitat.
According to NOAA fisheries, humpback whales migrate seasonally up to 5,000 miles. They travel between cold, productive waters at high latitudes (in the northen hemisphere) or low latitudes (in the southern hemisphere) where they feed in the summer, and tropical waters where they mate and calve in the winter. At least four distinct populations exist in the North Pacific, each with their own unique feeding and calving areas and migration routes. There are at least two populations in the North Atlantic, and seven in the southern hemisphere (https://www.fisheries.noaa.gov/species/humpback-whale)
Data description¶
Source for marine ecoregion boundaries: https://databasin.org/datasets/3b6b12e7bcca419990c9081c0af254a2/
Citation: Spalding MD, Fox HE, Allen GR, Davidson N, Ferdaña ZA, Finlayson M, Halpern BS, Jorge MA, Lombana A, Lourie SA, Martin KD, McManus E, Molnar J, Recchia CA, Robertson J (2007) Marine Ecoregions of the World: a bioregionalization of coast and shelf areas. BioScience 57: 573-583.
GBIF occurrences
Whale occurrences data was gathered by individuals who contributed to a variety of datasets by recording observations of whales. These datasets were then compiled by the Global Biodiversity Information Facility (GBIF) into a searchable database. My search was limited to observations that included coordinates, and that took place in 2023. It returned occurrences observations from 37 datasets. The greatest number of observations was recorded by "Happy Whale", a resource intended to benefit conservation science.
Citation: GBIF.org (23 October 2024) GBIF Occurrence Download https://doi.org/10.15468/dl.mbcaqz
Methods descriptions¶
The occurrences data are normalized by area, to account for sampling effort. This was done by counting the number of occurrences in each ecoregion, and dividing the number of occurences by the area to calculate the "density" of occurrences.
# import packages
import os
import calendar
import cartopy.crs as ccrs
from getpass import getpass
from glob import glob
import geopandas as gpd
%pip install geoviews
import geoviews as gv
import hvplot.pandas
import pandas as pd
import panel as pn
import pathlib
import pygbif.occurrences as occ
import pygbif.species as species
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)
import time
import zipfile
Requirement already satisfied: geoviews in /opt/conda/lib/python3.11/site-packages (1.13.0) Requirement already satisfied: bokeh<3.6.0,>=3.5.0 in /opt/conda/lib/python3.11/site-packages (from geoviews) (3.5.2) Requirement already satisfied: cartopy>=0.18.0 in /opt/conda/lib/python3.11/site-packages (from geoviews) (0.23.0) Requirement already satisfied: holoviews>=1.16.0 in /opt/conda/lib/python3.11/site-packages (from geoviews) (1.19.1) Requirement already satisfied: numpy in /opt/conda/lib/python3.11/site-packages (from geoviews) (2.0.2) Requirement already satisfied: packaging in /opt/conda/lib/python3.11/site-packages (from geoviews) (24.1) Requirement already satisfied: panel>=1.0.0 in /opt/conda/lib/python3.11/site-packages (from geoviews) (1.5.1) Requirement already satisfied: param in /opt/conda/lib/python3.11/site-packages (from geoviews) (2.1.1) Requirement already satisfied: pyproj in /opt/conda/lib/python3.11/site-packages (from geoviews) (3.7.0) Requirement already satisfied: shapely in /opt/conda/lib/python3.11/site-packages (from geoviews) (2.0.6) Requirement already satisfied: xyzservices in /opt/conda/lib/python3.11/site-packages (from geoviews) (2024.9.0) Requirement already satisfied: Jinja2>=2.9 in /opt/conda/lib/python3.11/site-packages (from bokeh<3.6.0,>=3.5.0->geoviews) (3.1.4) Requirement already satisfied: contourpy>=1.2 in /opt/conda/lib/python3.11/site-packages (from bokeh<3.6.0,>=3.5.0->geoviews) (1.3.0) Requirement already satisfied: pandas>=1.2 in /opt/conda/lib/python3.11/site-packages (from bokeh<3.6.0,>=3.5.0->geoviews) (2.2.3) Requirement already satisfied: pillow>=7.1.0 in /opt/conda/lib/python3.11/site-packages (from bokeh<3.6.0,>=3.5.0->geoviews) (10.4.0) Requirement already satisfied: PyYAML>=3.10 in /opt/conda/lib/python3.11/site-packages (from bokeh<3.6.0,>=3.5.0->geoviews) (6.0.2) Requirement already satisfied: tornado>=6.2 in /opt/conda/lib/python3.11/site-packages (from bokeh<3.6.0,>=3.5.0->geoviews) (6.4.1) Requirement already satisfied: matplotlib>=3.5 in /opt/conda/lib/python3.11/site-packages (from cartopy>=0.18.0->geoviews) (3.9.2) Requirement already satisfied: pyshp>=2.3 in /opt/conda/lib/python3.11/site-packages (from cartopy>=0.18.0->geoviews) (2.3.1) Requirement already satisfied: colorcet in /opt/conda/lib/python3.11/site-packages (from holoviews>=1.16.0->geoviews) (3.1.0) Requirement already satisfied: pyviz-comms>=2.1 in /opt/conda/lib/python3.11/site-packages (from holoviews>=1.16.0->geoviews) (3.0.3) Requirement already satisfied: bleach in /opt/conda/lib/python3.11/site-packages (from panel>=1.0.0->geoviews) (6.1.0) Requirement already satisfied: linkify-it-py in /opt/conda/lib/python3.11/site-packages (from panel>=1.0.0->geoviews) (2.0.3) Requirement already satisfied: markdown in /opt/conda/lib/python3.11/site-packages (from panel>=1.0.0->geoviews) (3.6) Requirement already satisfied: markdown-it-py in /opt/conda/lib/python3.11/site-packages (from panel>=1.0.0->geoviews) (3.0.0) Requirement already satisfied: mdit-py-plugins in /opt/conda/lib/python3.11/site-packages (from panel>=1.0.0->geoviews) (0.4.2) Requirement already satisfied: requests in /opt/conda/lib/python3.11/site-packages (from panel>=1.0.0->geoviews) (2.32.3) Requirement already satisfied: tqdm in /opt/conda/lib/python3.11/site-packages (from panel>=1.0.0->geoviews) (4.66.5) Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.11/site-packages (from panel>=1.0.0->geoviews) (4.12.2) Requirement already satisfied: certifi in /opt/conda/lib/python3.11/site-packages (from pyproj->geoviews) (2024.8.30) Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.11/site-packages (from Jinja2>=2.9->bokeh<3.6.0,>=3.5.0->geoviews) (2.1.5) Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.11/site-packages (from matplotlib>=3.5->cartopy>=0.18.0->geoviews) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.11/site-packages (from matplotlib>=3.5->cartopy>=0.18.0->geoviews) (4.54.1) Requirement already satisfied: kiwisolver>=1.3.1 in /opt/conda/lib/python3.11/site-packages (from matplotlib>=3.5->cartopy>=0.18.0->geoviews) (1.4.7) Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/lib/python3.11/site-packages (from matplotlib>=3.5->cartopy>=0.18.0->geoviews) (3.1.4) Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.11/site-packages (from matplotlib>=3.5->cartopy>=0.18.0->geoviews) (2.9.0) Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.11/site-packages (from pandas>=1.2->bokeh<3.6.0,>=3.5.0->geoviews) (2024.1) Requirement already satisfied: tzdata>=2022.7 in /opt/conda/lib/python3.11/site-packages (from pandas>=1.2->bokeh<3.6.0,>=3.5.0->geoviews) (2024.2) Requirement already satisfied: six>=1.9.0 in /opt/conda/lib/python3.11/site-packages (from bleach->panel>=1.0.0->geoviews) (1.16.0) Requirement already satisfied: webencodings in /opt/conda/lib/python3.11/site-packages (from bleach->panel>=1.0.0->geoviews) (0.5.1) Requirement already satisfied: uc-micro-py in /opt/conda/lib/python3.11/site-packages (from linkify-it-py->panel>=1.0.0->geoviews) (1.0.3) Requirement already satisfied: mdurl~=0.1 in /opt/conda/lib/python3.11/site-packages (from markdown-it-py->panel>=1.0.0->geoviews) (0.1.2) Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.11/site-packages (from requests->panel>=1.0.0->geoviews) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.11/site-packages (from requests->panel>=1.0.0->geoviews) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.11/site-packages (from requests->panel>=1.0.0->geoviews) (2.2.3) Note: you may need to restart the kernel to use updated packages.
/opt/conda/lib/python3.11/site-packages/dask/dataframe/__init__.py:42: FutureWarning: Dask dataframe query planning is disabled because dask-expr is not installed. You can install it with `pip install dask[dataframe]` or `conda install dask`. This will raise in a future version. warnings.warn(msg, FutureWarning)
%%bash
pip install pygbif
Requirement already satisfied: pygbif in /opt/conda/lib/python3.11/site-packages (0.6.4) Requirement already satisfied: requests>2.7 in /opt/conda/lib/python3.11/site-packages (from pygbif) (2.32.3) Requirement already satisfied: requests-cache in /opt/conda/lib/python3.11/site-packages (from pygbif) (1.2.1) Requirement already satisfied: geojson-rewind in /opt/conda/lib/python3.11/site-packages (from pygbif) (1.1.0) Requirement already satisfied: geomet in /opt/conda/lib/python3.11/site-packages (from pygbif) (1.1.0) Requirement already satisfied: appdirs>=1.4.3 in /opt/conda/lib/python3.11/site-packages (from pygbif) (1.4.4) Requirement already satisfied: matplotlib in /opt/conda/lib/python3.11/site-packages (from pygbif) (3.9.2) Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.11/site-packages (from requests>2.7->pygbif) (3.3.2) Requirement already satisfied: requests>2.7 in /opt/conda/lib/python3.11/site-packages (from pygbif) (2.32.3) Requirement already satisfied: requests-cache in /opt/conda/lib/python3.11/site-packages (from pygbif) (1.2.1) Requirement already satisfied: geojson-rewind in /opt/conda/lib/python3.11/site-packages (from pygbif) (1.1.0) Requirement already satisfied: geomet in /opt/conda/lib/python3.11/site-packages (from pygbif) (1.1.0) Requirement already satisfied: appdirs>=1.4.3 in /opt/conda/lib/python3.11/site-packages (from pygbif) (1.4.4) Requirement already satisfied: matplotlib in /opt/conda/lib/python3.11/site-packages (from pygbif) (3.9.2) Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.11/site-packages (from requests>2.7->pygbif) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.11/site-packages (from requests>2.7->pygbif) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.11/site-packages (from requests>2.7->pygbif) (2.2.3) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.11/site-packages (from requests>2.7->pygbif) (2024.8.30) Requirement already satisfied: click in /opt/conda/lib/python3.11/site-packages (from geomet->pygbif) (8.1.7) Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (1.3.0) Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (4.54.1) Requirement already satisfied: kiwisolver>=1.3.1 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (1.4.7) Requirement already satisfied: numpy>=1.23 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (2.0.2) Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (24.1) Requirement already satisfied: pillow>=8 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (10.4.0) Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (3.1.4) Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (2.9.0) Requirement already satisfied: attrs>=21.2 in /opt/conda/lib/python3.11/site-packages (from requests-cache->pygbif) (24.2.0) Requirement already satisfied: cattrs>=22.2 in /opt/conda/lib/python3.11/site-packages (from requests-cache->pygbif) (24.1.2) Requirement already satisfied: platformdirs>=2.5 in /opt/conda/lib/python3.11/site-packages (from requests-cache->pygbif) (4.3.6) Requirement already satisfied: url-normalize>=1.4 in /opt/conda/lib/python3.11/site-packages (from requests-cache->pygbif) (1.4.3) Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.11/site-packages (from python-dateutil>=2.7->matplotlib->pygbif) (1.16.0)
# Create data directory in the home folder
data_dir = os.path.join(
# Home directory
pathlib.Path.home(),
# Earth analytics data directory
'earth-analytics',
'data',
# Project directory
'whale-migration',
)
data_dir
os.makedirs(data_dir, exist_ok=True)
# Define the directory name for GBIF data
gbif_dir = os.path.join(data_dir, 'gbif')
Source for marine ecoregion boundaries: https://databasin.org/datasets/3b6b12e7bcca419990c9081c0af254a2/
Citation: Spalding MD, Fox HE, Allen GR, Davidson N, Ferdaña ZA, Finlayson M, Halpern BS, Jorge MA, Lombana A, Lourie SA, Martin KD, McManus E, Molnar J, Recchia CA, Robertson J (2007) Marine Ecoregions of the World: a bioregionalization of coast and shelf areas. BioScience 57: 573-583.
# Define eco_path
eco_path = 'Marine_ecosystems.zip!MEOW/meow_ecos.shp'
# Open up the ecoregions boundaries
eco_gdf = (
gpd.read_file(eco_path)
[['ECO_CODE', 'ECOREGION', 'geometry']]
.rename(columns={
'ECO_CODE': 'ecoregion_ID',
'ECOREGION': 'name',
})
.set_index('ecoregion_ID')
)
eco_gdf['area'] = eco_gdf.to_crs(9822).geometry.area/1000000
# Plot the ecoregions to check download
eco_gdf
name | geometry | area | |
---|---|---|---|
ecoregion_ID | |||
20192.0 | Agulhas Bank | POLYGON ((28.35993 -36.64435, 28.2835 -36.6769... | 3.420362e+06 |
20053.0 | Aleutian Islands | MULTIPOLYGON (((-173.39419 55.59807, -168.5559... | 5.386821e+06 |
20072.0 | Amazonia | POLYGON ((-41.13012 0.47319, -41.03905 0.4439,... | 2.564858e+06 |
20194.0 | Amsterdam-St Paul | POLYGON ((77.52994 -34.5229, 77.92307 -34.5450... | 2.746581e+06 |
20228.0 | Amundsen/Bellingshausen Sea | POLYGON ((-72.94222 -74.30962, -79.88636 -75.1... | 5.643643e+08 |
... | ... | ... | ... |
25034.0 | Ionian Sea | POLYGON ((18.28997 40.28222, 18.46579 40.19296... | 4.582846e+05 |
25031.0 | Aegean Sea | POLYGON ((26.80983 41.87778, 29.19196 41.03381... | 6.136867e+05 |
25036.0 | Alboran Sea | POLYGON ((0.32179 32.44572, 0.23032 32.38912, ... | 3.578930e+05 |
25035.0 | Western Mediterranean | POLYGON ((12.48402 37.99237, 11.95092 37.85696... | 1.389664e+06 |
25010.0 | High Arctic Archipelago | POLYGON ((-101.6265 75.65949, -101.6265 75.659... | 1.939780e+06 |
232 rows × 3 columns
eco_gdf.head()
name | geometry | area | |
---|---|---|---|
ecoregion_ID | |||
20192.0 | Agulhas Bank | POLYGON ((28.35993 -36.64435, 28.2835 -36.6769... | 3.420362e+06 |
20053.0 | Aleutian Islands | MULTIPOLYGON (((-173.39419 55.59807, -168.5559... | 5.386821e+06 |
20072.0 | Amazonia | POLYGON ((-41.13012 0.47319, -41.03905 0.4439,... | 2.564858e+06 |
20194.0 | Amsterdam-St Paul | POLYGON ((77.52994 -34.5229, 77.92307 -34.5450... | 2.746581e+06 |
20228.0 | Amundsen/Bellingshausen Sea | POLYGON ((-72.94222 -74.30962, -79.88636 -75.1... | 5.643643e+08 |
# Plot the ecoregions to check download
eco_gdf.plot()
<Axes: >
reset_credentials = False
# GBIF needs a username, password, and email
credentials = dict(
GBIF_USER=(input, 'GBIF username:'),
GBIF_PWD=(getpass, 'GBIF password:'),
GBIF_EMAIL=(input, 'GBIF email:'),
)
for env_variable, (prompt_func, prompt_text) in credentials.items():
# Delete credential from environment if requested
if reset_credentials and (env_variable in os.environ):
os.environ.pop(env_variable)
# Ask for credential and save to environment
if not env_variable in os.environ:
os.environ[env_variable] = prompt_func(prompt_text)
# Find out the species key used by GBIF to identify Megaptera novaeangliae
# Query species
species_info = species.name_lookup('Megaptera novaeangliae', rank='SPECIES')
# Get the first result
first_result = species_info['results'][0]
# Get the species key (nubKey)
species_key = first_result['nubKey']
# Check the result
first_result['species'], species_key
('Megaptera novaeangliae', 5220086)
# Only download once
gbif_pattern = os.path.join(gbif_dir, '*.csv')
if not glob(gbif_pattern):
# Only submit one request
if not 'GBIF_DOWNLOAD_KEY' in os.environ:
# Submit query to GBIF
gbif_query = occ.download([
f"speciesKey = {5220086}",
"hasCoordinate = True",
"year = 2023",
])
os.environ['GBIF_DOWNLOAD_KEY'] = gbif_query[0]
# Wait for the download to build
download_key = os.environ['GBIF_DOWNLOAD_KEY']
wait = occ.download_meta(download_key)['status']
while not wait=='SUCCEEDED':
wait = occ.download_meta(download_key)['status']
time.sleep(5)
# Download GBIF data
download_info = occ.download_get(
os.environ['GBIF_DOWNLOAD_KEY'],
path=data_dir)
# Unzip GBIF data
with zipfile.ZipFile(download_info['path']) as download_zip:
download_zip.extractall(path=gbif_dir)
# Find the extracted .csv file path (take the first result)
gbif_path = glob(gbif_pattern)[0]
gbif_path
'/home/jovyan/earth-analytics/data/whale-migration/gbif/0017082-241107131044228.csv'
!head -n 2 $gbif_path
gbifID datasetKey occurrenceID kingdom phylum class order family genus species infraspecificEpithet taxonRank scientificName verbatimScientificName verbatimScientificNameAuthorship countryCode locality stateProvince occurrenceStatus individualCount publishingOrgKey decimalLatitude decimalLongitude coordinateUncertaintyInMeters coordinatePrecision elevation elevationAccuracy depth depthAccuracy eventDate day month year taxonKey speciesKey basisOfRecord institutionCode collectionCode catalogNumber recordNumber identifiedBy dateIdentified license rightsHolder recordedBy typeStatus establishmentMeans lastInterpreted mediaType issue 4978476737 50c9509d-22c7-4a22-a47d-8c48425ef4a7 https://www.inaturalist.org/observations/250594013 Animalia Chordata Mammalia Cetacea Balaenopteridae Megaptera Megaptera novaeangliae SPECIES Megaptera novaeangliae (Borowski, 1781) Megaptera novaeangliae US Massachusetts PRESENT 28eb1a3f-1c15-4a95-931a-4af90ecb574d 42.030472 -70.307103 122.0 2023-09-20T11:47 20 9 2023 5220086 5220086 HUMAN_OBSERVATION iNaturalist Observations 250594013 Alex J Burchard 2024-11-05T19:18:34 CC_BY_NC_4_0 Alex J Burchard Alex J Burchard 2024-11-10T23:00:41.107Z StillImage COORDINATE_ROUNDED;TAXON_MATCH_TAXON_ID_IGNORED
# Load the GBIF data
gbif_df = pd.read_csv(
gbif_path,
delimiter='\t',
index_col='gbifID',
usecols=['gbifID', 'decimalLatitude', 'decimalLongitude', 'month']
)
gbif_df.info()
<class 'pandas.core.frame.DataFrame'> Index: 38913 entries, 4978476737 to 4011495163 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 decimalLatitude 38913 non-null float64 1 decimalLongitude 38913 non-null float64 2 month 38886 non-null float64 dtypes: float64(3) memory usage: 1.2 MB
gbif_df
decimalLatitude | decimalLongitude | month | |
---|---|---|---|
gbifID | |||
4978476737 | 42.030472 | -70.307103 | 9.0 |
4978465436 | 66.034955 | -17.443391 | 8.0 |
4978406436 | -42.501729 | 173.686505 | 1.0 |
4978351205 | 42.222313 | -70.287645 | 8.0 |
4978336701 | 78.571389 | 8.718056 | 7.0 |
... | ... | ... | ... |
4011693247 | 20.627592 | -105.568071 | 1.0 |
4011684281 | 20.739564 | -105.362788 | 1.0 |
4011679282 | 20.657117 | -105.326612 | 1.0 |
4011507292 | 20.750622 | -105.544966 | 1.0 |
4011495163 | -62.955128 | -60.330620 | 1.0 |
38913 rows × 3 columns
gbif_gdf = (
gpd.GeoDataFrame(
gbif_df,
geometry=gpd.points_from_xy(
gbif_df.decimalLongitude,
gbif_df.decimalLatitude),
crs="EPSG:4326")
# Select the desired columns
[['month', 'geometry']]
)
gbif_gdf
month | geometry | |
---|---|---|
gbifID | ||
4978476737 | 9.0 | POINT (-70.3071 42.03047) |
4978465436 | 8.0 | POINT (-17.44339 66.03496) |
4978406436 | 1.0 | POINT (173.6865 -42.50173) |
4978351205 | 8.0 | POINT (-70.28764 42.22231) |
4978336701 | 7.0 | POINT (8.71806 78.57139) |
... | ... | ... |
4011693247 | 1.0 | POINT (-105.56807 20.62759) |
4011684281 | 1.0 | POINT (-105.36279 20.73956) |
4011679282 | 1.0 | POINT (-105.32661 20.65712) |
4011507292 | 1.0 | POINT (-105.54497 20.75062) |
4011495163 | 1.0 | POINT (-60.33062 -62.95513) |
38913 rows × 2 columns
%store eco_gdf gbif_gdf
Stored 'eco_gdf' (GeoDataFrame) Stored 'gbif_gdf' (GeoDataFrame)
gbif_ecoregion_gdf = (
eco_gdf
# Match the CRS of the GBIF data and the ecoregions
.to_crs(gbif_gdf.crs)
# Find ecoregion for each observation
.sjoin(
gbif_gdf,
how='inner',
predicate='contains')
)
gbif_ecoregion_gdf
name | geometry | area | gbifID | month | |
---|---|---|---|---|---|
ecoregion_ID | |||||
20192.0 | Agulhas Bank | POLYGON ((28.35993 -36.64435, 28.2835 -36.6769... | 3.420362e+06 | 4407552892 | 8.0 |
20192.0 | Agulhas Bank | POLYGON ((28.35993 -36.64435, 28.2835 -36.6769... | 3.420362e+06 | 4427045409 | 7.0 |
20192.0 | Agulhas Bank | POLYGON ((28.35993 -36.64435, 28.2835 -36.6769... | 3.420362e+06 | 4427045407 | 7.0 |
20192.0 | Agulhas Bank | POLYGON ((28.35993 -36.64435, 28.2835 -36.6769... | 3.420362e+06 | 4427045778 | 7.0 |
20192.0 | Agulhas Bank | POLYGON ((28.35993 -36.64435, 28.2835 -36.6769... | 3.420362e+06 | 4427048471 | 10.0 |
... | ... | ... | ... | ... | ... |
25085.0 | Gulf of Guinea South | POLYGON ((13.11166 -1.08534, 13.15398 -1.15744... | 9.361061e+05 | 4427047380 | 8.0 |
25085.0 | Gulf of Guinea South | POLYGON ((13.11166 -1.08534, 13.15398 -1.15744... | 9.361061e+05 | 4427047381 | 8.0 |
25085.0 | Gulf of Guinea South | POLYGON ((13.11166 -1.08534, 13.15398 -1.15744... | 9.361061e+05 | 4872007564 | 8.0 |
25085.0 | Gulf of Guinea South | POLYGON ((13.11166 -1.08534, 13.15398 -1.15744... | 9.361061e+05 | 4872007318 | 9.0 |
25035.0 | Western Mediterranean | POLYGON ((12.48402 37.99237, 11.95092 37.85696... | 1.389664e+06 | 4857371345 | 3.0 |
38883 rows × 5 columns
occurrence_df = (
gbif_ecoregion_gdf
#reset index
.reset_index()
# For each ecoregion, for each month...
.groupby(['ecoregion_ID', 'month'])
# ...count the number of occurrences
.agg(
occurrences=('gbifID', 'count'),
area=('area', 'first'))
)
#Normalize by area
occurrence_df['density'] = (occurrence_df.occurrences
/ occurrence_df.area
)
# Get rid of rare observations (possible misidentification?)
occurrence_df = occurrence_df[occurrence_df.occurrences>1]
occurrence_df
occurrences | area | density | ||
---|---|---|---|---|
ecoregion_ID | month | |||
20002.0 | 5.0 | 2 | 5.380355e+05 | 3.717227e-06 |
6.0 | 8 | 5.380355e+05 | 1.486891e-05 | |
7.0 | 9 | 5.380355e+05 | 1.672752e-05 | |
8.0 | 6 | 5.380355e+05 | 1.115168e-05 | |
20003.0 | 7.0 | 7 | 1.599690e+06 | 4.375847e-06 |
... | ... | ... | ... | ... |
25190.0 | 6.0 | 2 | 2.397421e+06 | 8.342299e-07 |
25199.0 | 1.0 | 3 | 7.383533e+06 | 4.063096e-07 |
5.0 | 5 | 7.383533e+06 | 6.771826e-07 | |
6.0 | 9 | 7.383533e+06 | 1.218929e-06 | |
11.0 | 3 | 7.383533e+06 | 4.063096e-07 |
425 rows × 3 columns
# Take the mean by ecoregion
mean_occurrences_by_ecoregion = (
occurrence_df
.groupby('ecoregion_ID')
.mean()
)
mean_occurrences_by_ecoregion
occurrences | area | density | |
---|---|---|---|
ecoregion_ID | |||
20002.0 | 6.250000 | 5.380355e+05 | 1.161633e-05 |
20003.0 | 4.666667 | 1.599690e+06 | 2.917231e-06 |
20004.0 | 24.000000 | 1.403924e+06 | 1.709494e-05 |
20005.0 | 21.666667 | 5.618537e+05 | 3.856283e-05 |
20018.0 | 25.500000 | 3.826869e+06 | 6.663411e-06 |
... | ... | ... | ... |
25147.0 | 2.000000 | 1.613630e+07 | 1.239441e-07 |
25152.0 | 651.571429 | 9.412568e+06 | 6.922356e-05 |
25184.0 | 7.000000 | 5.119241e+06 | 1.367390e-06 |
25190.0 | 2.000000 | 2.397421e+06 | 8.342299e-07 |
25199.0 | 5.000000 | 7.383533e+06 | 6.771826e-07 |
93 rows × 3 columns
# Take the mean by month
mean_occurrences_by_month = (
occurrence_df
.groupby('month')
.mean()
)
mean_occurrences_by_month
occurrences | area | density | |
---|---|---|---|
month | |||
1.0 | 111.517241 | 2.461128e+07 | 0.000061 |
2.0 | 94.657143 | 9.367862e+07 | 0.000045 |
3.0 | 92.100000 | 5.287040e+06 | 0.000050 |
4.0 | 60.913043 | 5.329806e+06 | 0.000081 |
5.0 | 48.827586 | 3.834827e+06 | 0.000081 |
6.0 | 69.763158 | 4.259685e+06 | 0.000091 |
7.0 | 130.108696 | 2.488975e+06 | 0.000164 |
8.0 | 124.826923 | 2.172032e+06 | 0.000155 |
9.0 | 101.404255 | 2.741738e+06 | 0.000132 |
10.0 | 71.459459 | 2.192268e+06 | 0.000104 |
11.0 | 52.843750 | 4.717788e+06 | 0.000053 |
12.0 | 88.814815 | 5.916612e+06 | 0.000062 |
# Normalize by space and time for sampling effort
occurrence_df['norm_occurrences'] = (
occurrence_df[['density']]
/ mean_occurrences_by_ecoregion[['density']]
/ mean_occurrences_by_month[['density']]
)
occurrence_df
occurrences | area | density | norm_occurrences | ||
---|---|---|---|---|---|
ecoregion_ID | month | ||||
20002.0 | 5.0 | 2 | 5.380355e+05 | 3.717227e-06 | 3952.523164 |
6.0 | 8 | 5.380355e+05 | 1.486891e-05 | 14084.535377 | |
7.0 | 9 | 5.380355e+05 | 1.672752e-05 | 8786.571689 | |
8.0 | 6 | 5.380355e+05 | 1.115168e-05 | 6179.487777 | |
20003.0 | 7.0 | 7 | 1.599690e+06 | 4.375847e-06 | 9152.678843 |
... | ... | ... | ... | ... | ... |
25190.0 | 6.0 | 2 | 2.397421e+06 | 8.342299e-07 | 11003.543263 |
25199.0 | 1.0 | 3 | 7.383533e+06 | 4.063096e-07 | 9894.385097 |
5.0 | 5 | 7.383533e+06 | 6.771826e-07 | 12351.634886 | |
6.0 | 9 | 7.383533e+06 | 1.218929e-06 | 19806.377874 | |
11.0 | 3 | 7.383533e+06 | 4.063096e-07 | 11223.694547 |
425 rows × 4 columns
%store occurrence_df
Stored 'occurrence_df' (DataFrame)
occurrence_df.reset_index().plot.scatter(
x='month', y='norm_occurrences', c='ecoregion_ID',
logy=True
)
<Axes: xlabel='month', ylabel='norm_occurrences'>
%store occurrence_df
Stored 'occurrence_df' (DataFrame)
# Simplify the geometry to speed up processing
eco_gdf.geometry = eco_gdf.simplify(.1, preserve_topology=False)
# Change the CRS to Mercator for mapping
eco_gdf = eco_gdf.to_crs(ccrs.Mercator())
# Check that the plot runs in a reasonable amount of time
eco_gdf.hvplot(geo=True, crs=ccrs.Mercator())
occurrence_df.head()
occurrences | area | density | norm_occurrences | ||
---|---|---|---|---|---|
ecoregion_ID | month | ||||
20002.0 | 5.0 | 2 | 5.380355e+05 | 0.000004 | 3952.523164 |
6.0 | 8 | 5.380355e+05 | 0.000015 | 14084.535377 | |
7.0 | 9 | 5.380355e+05 | 0.000017 | 8786.571689 | |
8.0 | 6 | 5.380355e+05 | 0.000011 | 6179.487777 | |
20003.0 | 7.0 | 7 | 1.599690e+06 | 0.000004 | 9152.678843 |
(calendar.month_name[1:])
['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
# Join the occurrences with the plotting GeoDataFrame
occurrence_gdf = eco_gdf.join(occurrence_df[['norm_occurrences']])
# Get the plot bounds so they don't change with the slider
xmin, ymin, xmax, ymax = gbif_gdf.to_crs(ccrs.Mercator()).total_bounds
month_widget = pn.widgets.DiscreteSlider(
options={
calendar.month_name[month_num]: month_num
for month_num in range(1, 12)
}
)
# Plot occurrence by ecoregion and month
migration_plot = (
occurrence_gdf
.to_crs(ccrs.Mercator())
.hvplot(
c='norm_occurrences',
groupby='month',
# Use background tiles
geo=True, crs=ccrs.Mercator(), tiles='CartoLight',
title="Humpback Whale Migration",
xlim=(xmin, xmax), ylim=(ymin, ymax),
frame_height=600,
widgets={'month': month_widget},
widget_location='bottom'
)
)
# Save the plot
migration_plot.save('whale-migration.html', embed=True)
# Show the plot
migration_plot
WARNING:bokeh.core.validation.check:W-1005 (FIXED_SIZING_MODE): 'fixed' sizing mode requires width and height to be set: figure(id='p1651', ...)
BokehModel(combine_events=True, render_bundle={'docs_json': {'3a86e9f2-1263-42d8-92e8-b764650b8764': {'version…
Plot description, discussion, conclusion¶
The interactive plot above shows how many whales were observed in a particular ecoregion, at a particular time of year. Based on the sources I referenced above, it should be possible to see the patterns discussed earlier, with more observations at far Northern or Southern latitudes during summer months, and more observations in tropical waters during winter months. The pattern is difficult to dsicern, however it is possible to see in some areas. For example, the ecoregions off the Pacific Coast of North and Central America, along with Hawaii, do show a greater concentration of whales in Alaska during the summer months (particularly July), and in Mexico during the winter months (February).
In other places, the greater concentrations of whales do not appear to fit the pattern - for example there is a large concentration of observations recorded in Antarctica during February. This may be because whale migrations are more complicated than simply moving North or South based on the season. They may be based on more complex patterns such as the migration of food species, or even patterns of human activity. Another possibility is that the north-south migrations do occur, but observations are limited due to the extremely remote nature of many of the locations in question. A third possibility is that the whales are congregating in areas not marked by the Marine Ecoregions shapefile, which covered primarily coastal ecoregions and excluded some deeper ocean territory. Further study or additional data would be required in order to make these distinctions.