GSoC Contributor’s Guide#

Attention

This program has ended for this year. Come back next year if you’d like to participate. In the meantime checkout this year’s contribution.

The Echoshader team aims to recruit talented Google Summer of Code (GSoC) participants to help us create the capability to interactively visualize large volumes of cloud-based ocean sonar data to accelerate the data exploration and discovery process. This project will go hand-in-hand with ongoing development of the echopype package that handles the standardization, pre-processing, and organization of these data. We aim for creating ocean sonar data visualization functionalities based on the HoloViz suite of Python tools, and developing widgets that can be flexibly combined in Panel dashboards or used in Jupyter notebooks for interactive data visualization.

Getting Started#

The sonar data we will be working with can come from several different instruments and are stored in different binary formats specific to these instruments. This binary data is difficult to work with directly and does not allow for efficient processing. We use echopype to convert the raw data into a more user friendly structure following an interoperable netCDF data model, and serialize the data into netCDF or Zarr formats. This standardized raw data is then calibrated to arrive at the datasets you will be working with, also in netCDF or Zarr formats.

Before diving into the project, we suggest that you review the items below. We also provide some additional helpful resources and initial steps to get you started.

Storage format#

We use two formats to store the data:

  • netCDF files - the current defacto file for working with multidimensional, array-oriented scientific data from climate and oceanographic research. Although it is not necessary to understand the netCDF library in its entirety, Unidata (the netCDF maintainer) does provide a well documented netCDF python interface. This documentation describes useful aspects of how netCDF defines common terms such as groups, dimensions, variables, and attributes.

  • Zarr - a format for the storage of chunked, compressed, arrays. Zarr has similar characteristics to netCDF, but has the added benefit of being a cloud-native data format. For this reason, Zarr is ideal for storing large data sets in the cloud. Zarr provides a great overview of its storage specifications that may be useful to read.

Data Structures#

netCDF and Zarr formats can be easily read with the xarray library in Python. Additionally, xarray enables efficient computation of our data, which is labelled and multi-dimensional. A fantastic xarray tutorial has been put together that describes the fundamentals of xarray. Be sure to become familiar with both DataArrays and Datasets as they are heavily used.

Ocean Sonar Data: What are in the Datasets?#

For this project, you will be initially working with the output of the compute_Sv function. This is a function in echopype that computes the volume backscattering strength (Sv) from the raw data. Sv is basically how strong the echo return is from a volume of water. This function returns an xarray Dataset that has several variables that are necessary for the visualization of ocean sonar data.

The Dataset has the dimensions and coordinates:

  • frequency - sonar transducer frequency, with units Hz

  • ping_time - timestamp of each ping

  • range_bin - sample index along a range

The data variables of the Dataset are listed below, where items in parenthesis are the dimensions of the data variables:

  • Key data variables you will be working with:

    • Sv (frequency, ping_time, range_bin) - volume backscattering strength measured from the echo

    • range (frequency, ping_time, range_bin) - the measured range of an echo in meters

  • Other variables included in this dataset. These are included so that the exact parameters used in the calibration (from raw to Sv) are recorded:

    • temperature - the temperature measurement of the water collected by the echosounder, with unit degree Celsius

    • salinity - the salinity measurement of the water collected by the echosounder, with unit part per thousand (PSU)

    • pressure - the pressure measurement of the water collected by the echosounder, with unit dbars

    • sound_speed (frequency, ping_time) - sound speed (in units m/s) for the provided temperature, salinity, and pressure

    • sound_absorption (frequency, ping_time) - sea water absorption (in units dB/m) for each frequency and ping time, this value is based on the temperature, salinity, pressure, and sound_speed

    • sa_correction (frequency) - the sa correction for each frequency

    • gain_correction - (frequency) - the gain correction for each frequency

    • equivalent_beam_angle (frequency) - the beam angle for each frequency

Visualizing Ocean Sonar Data#

Using the above Dataset we can visualize the strength of the echoes (often called the echogram) by plotting Sv along ping_time and range_bin (here, an inverse water depth measure) axes, where the water surface is near the top of the image (the bright red line):

echogram example 1

By compiling several of these echograms and processing the data further, one can visualize the data over several hours. This can yield visualizations such as the image below, which shows the daily vertical migration of zooplankton in the water column – including the impact of a solar eclipse on this migration!

echogram example 2

Additional Resources#

Some useful resources for getting started with the proposed visualization tools:

  • Getting started with HoloViz

  • Useful resources and example dashboards in Panel

Initial Steps to Become Familiar with the Data and Visualizations#

  1. Read the example files provided in TBD using xarray

  2. Construct a widget that displays the Sv variable with ping_time as the x coordinate and range_bin as the y coordinate

  3. Improve the widget by allowing the user to change the frequency and the colormap

  4. Explore the desired types of visualization – these are issues labeled with gsoc 2022 wanted

  5. Become familiar with the notebook examples provided in TBD.

Brainstorm with us#

In the Issues section of this repository, we list some visualization ideas from mentors. We encourage you as a GSoC participant to propose your own original project ideas by creating a new issue in this repo.

Please sign up as a GSoC participant. Once the official application opens, please submit your proposals based on the Echoshader GSoC Proposal template.

Questions?#

For project-related question, feel free to raise an issue.

Having more questions about being a GSoC mentor or participant? Check out the GSoC mentor & participant guides.

The Mentor Team#

The GSoC 2022 mentor team consists of

members: Brandon Reyes (@b-reyes), Emilio Mayorga (@emiliom), Wu-Jung Lee (@leewujung), Don Setiawan (@lsetiawan), Valentina Staneva (@valentina-s) of the Echospace group at the University of Washington in Seattle. We are a diverse group of researchers whose work centers around extracting knowledge from large volumes of ocean acoustic data, which contain rich information about animals ranging from zooplankton, fish, to marine mammals. Integrating physics-based models and data-driven methods, our current work focuses on mining water column sonar data and spans a broad spectrum from developing computational methods, building open source software and cloud applications, to joint analysis of acoustic observations and ocean environmental variables.