pyRSKTools¶
Introduction¶
pyRSKTools is a simple Python toolbox to open RSK SQLite files generated by RBR instruments. Its functionality is read-only. It is a partial port of the MATLAB-based RSKTools and is in initial stages of development.
pyRSKTools targets Python 3.
What does version 0.y.z version mean?¶
pyRSKTools is in initial development. Per SemVer, the API is liable to change drastically and suddenly. Additionally, data returned may be inaccurate, incomplete, or just plain incorrect. Think we’re doing the wrong thing? Think we could do something better? Tell us: now is the best time to fix it.
Table of Contents¶
Installing¶
PyPI should have a reasonably up-to-date version available:
$ pip3 install pyrsktools
If you want to install the latest development version from source, you can do that too:
$ git clone https://bitbucket.org/rbr/pyrsktools
$ cd pyrsktools
$ make install
$ # Or, if you prefer to do it yourself...
$ pip3 install -e .
Testing¶
A clean bill of health from both unit tests and the linter is required for a successful automated build.
Unit Tests¶
To test the project,
set the RSK
environment variable
to point at an RSK file:
$ export RSK=~/Downloads/some_rsk.rsk
Then run the unit tests:
$ make test
Using¶
The Essentials¶
Opening a Dataset¶
The module includes a function to open an RSK file:
>>> import pyrsktools # Import the library
>>> rsk = pyrsktools.open('some_rsk.rsk') # Load up an RSK
This returns an RSK
object
against which all other library operations
are performed.
Once you’re finished with the dataset, it should be closed:
>>> rsk.close()
open
can also be used
with the with
statement:
>>> with pyrsktools.open('some_rsk.rsk') as rsk:
>>> # Do something with the RSK. It will be automatically closed
>>> # at the end of the block.
What’s Inside?¶
Metadata¶
The RSK provides some basic metadata about itself:
>>> rsk.name # What was the filename of the RSK?
'080281_20150911_1112.rsk'
the instrument that recorded it:
>>> rsk.instrument # What instrument was used?
Instrument(serial=80281, model='RBRmaestro', firmware_version='1.2', firmware_type=103)
>>> rsk.channels # What channels were present on the instrument?
OrderedDict([('conductivity_00', Channel(id=1, key='cond06', label='conductivity_00', name='Conductivity', units='mS/cm', derived=False)), ('temperature_00', Channel(id=2, key='temp09', label='temperature_00', name='Temperature', units='°C', derived=False)), ('pressure_00', Channel(id=3, key='pres19', label='pressure_00', name='Pressure', units='dbar', derived=False)), ('oxygensaturation_00', Channel(id=4, key='doxy09', label='oxygensaturation_00', name='Dissolved O₂', units='%', derived=False)), ('chlorophyll_00', Channel(id=5, key='fluo10', label='chlorophyll_00', name='Chlorophyll a', units='µg/l', derived=False)), ('cdom_00', Channel(id=6, key='fluo11', label='cdom_00', name='CDOM', units='ppb', derived=False)), ('turbidity_00', Channel(id=7, key='turb01', label='turbidity_00', name='Turbidity', units='NTU', derived=False)), ('seapressure_00', Channel(id=8, key='pres08', label='seapressure_00', name='Sea pressure', units='dbar', derived=True)), ('depth_00', Channel(id=9, key='dpth01', label='depth_00', name='Depth', units='m', derived=True)), ('salinity_00', Channel(id=10, key='sal_00', label='salinity_00', name='Salinity', units='PSU', derived=True))])
and the deployment:
>>> rsk.deployment
Deployment(id=1, comment='', logger_status=None, logger_time_drift=0, download_time=datetime.datetime(2015, 9, 11, 7, 12, 30, 905000, tzinfo=datetime.timezone.utc), name='080281_20150911_1112.rsk', sample_size=6711588)
Samples¶
But you probably care most about the sample data. Samples can be accessed in two ways. They can always be accessed iteratively, via a generator:
>>> rsk.samples()
<generator object RSK.samples at 0x10741bf10>
>>> import itertools
>>> for sample in itertools.islice(rsk.samples(), 3):
... sample
...
Sample(timestamp=datetime.datetime(2015, 8, 29, 8, 28, 9, 333000, tzinfo=datetime.timezone.utc), conductivity_00=50.468727111816406, temperature_00=28.92376708984375, pressure_00=19.332664489746094, oxygensaturation_00=103.3949203491211, chlorophyll_00=0.128173828125, cdom_00=0.0048828125, turbidity_00=1.0341796875, seapressure_00=9.200664520263672, depth_00=9.144135475158691, salinity_00=30.400436401367188)
Sample(timestamp=datetime.datetime(2015, 8, 29, 8, 28, 9, 500000, tzinfo=datetime.timezone.utc), conductivity_00=50.469181060791016, temperature_00=28.92388916015625, pressure_00=19.370471954345703, oxygensaturation_00=103.39202880859375, chlorophyll_00=0.1822509765625, cdom_00=0.082763671875, turbidity_00=1.0419921875, seapressure_00=9.238471984863281, depth_00=9.181711196899414, salinity_00=30.40065574645996)
Sample(timestamp=datetime.datetime(2015, 8, 29, 8, 28, 9, 667000, tzinfo=datetime.timezone.utc), conductivity_00=50.468833923339844, temperature_00=28.92388916015625, pressure_00=19.395383834838867, oxygensaturation_00=103.46443939208984, chlorophyll_00=0.24127197265625, cdom_00=0.116455078125, turbidity_00=1.0439453125, seapressure_00=9.263383865356445, depth_00=9.206469535827637, salinity_00=30.400409698486328)
Or if NumPy is available, they can be retrieved into an array:
>>> rsk.npsamples()
array([ (datetime.datetime(2015, 8, 29, 8, 28, 9, 333000, tzinfo=datetime.timezone.utc), 5.04687271e+01, 28.92376709, 19.33266449, 103.39492035, 0.12817383, 0.00488281, 1.03417969, 9.20066452, 9.14413548, 3.04004364e+01),
(datetime.datetime(2015, 8, 29, 8, 28, 9, 500000, tzinfo=datetime.timezone.utc), 5.04691811e+01, 28.92388916, 19.37047195, 103.39202881, 0.18225098, 0.08276367, 1.04199219, 9.23847198, 9.1817112 , 3.04006557e+01),
(datetime.datetime(2015, 8, 29, 8, 28, 9, 667000, tzinfo=datetime.timezone.utc), 5.04688339e+01, 28.92388916, 19.39538383, 103.46443939, 0.24127197, 0.11645508, 1.04394531, 9.26338387, 9.20646954, 3.04004097e+01),
...,
(datetime.datetime(2015, 9, 11, 7, 11, 26, 833000, tzinfo=datetime.timezone.utc), 5.70757780e-04, 21.03649902, 10.05975151, 105.14163208, -0.0090332 , -0.18981934, -0.04785156, -0.07224846, -0.07180457, 1.03646256e-02),
(datetime.datetime(2015, 9, 11, 7, 11, 27, tzinfo=datetime.timezone.utc), 9.17379744e-04, 21.03649902, 10.06026268, 105.09892273, -0.01196289, -0.18041992, -0.0390625 , -0.07173729, -0.07129654, 1.03617543e-02),
(datetime.datetime(2015, 9, 11, 7, 11, 27, 167000, tzinfo=datetime.timezone.utc), -3.77910328e-04, 21.03656006, 10.06090927, 105.13765717, -0.01171875, -0.17102051, -0.02880859, -0.0710907 , -0.07065392, 0.00000000e+00)],
dtype=[('timestamp', 'O'), ('conductivity_00', '<f8'), ('temperature_00', '<f8'), ('pressure_00', '<f8'), ('oxygensaturation_00', '<f8'), ('chlorophyll_00', '<f8'), ('cdom_00', '<f8'), ('turbidity_00', '<f8'), ('seapressure_00', '<f8'), ('depth_00', '<f8'), ('salinity_00', '<f8')])
Data returned from both the samples
and npsamples
functions
can be time-limited
via the start_time
and end_time
named arguments:
>>> from datetime import datetime, timezone
>>> rsk.npsamples(start_time=datetime(2015, 9, 2, tzinfo=timezone.utc),
... end_time=datetime(2015, 9, 3, tzinfo=timezone.utc))
array([ (datetime.datetime(2015, 9, 2, 0, 0, tzinfo=datetime.timezone.utc), 50.78747177, 23.58898926, 97.16591644, 2.78220224, 0.44580078, 0.55810547, -0.15380859, 87.03392029, 86.49918365, 34.35256195),
(datetime.datetime(2015, 9, 2, 0, 0, 0, 167000, tzinfo=datetime.timezone.utc), 50.81333542, 23.61236572, 97.32424164, 2.77061081, 0.44104004, 0.54382324, -0.15332031, 87.19224548, 86.65653992, 34.35404587),
(datetime.datetime(2015, 9, 2, 0, 0, 0, 333000, tzinfo=datetime.timezone.utc), 50.84253311, 23.61846924, 97.51081848, 2.74815583, 0.42736816, 0.5559082 , -0.15722656, 87.3788147 , 86.84196472, 34.3714447 ),
...,
(datetime.datetime(2015, 9, 2, 23, 59, 59, 500000, tzinfo=datetime.timezone.utc), 47.95243073, 20.36456299, 113.82951355, 0.53426731, 0.95812988, 0.65759277, -0.08203125, 103.69750977, 103.06039429, 34.69152832),
(datetime.datetime(2015, 9, 2, 23, 59, 59, 667000, tzinfo=datetime.timezone.utc), 47.94488144, 20.36938477, 113.88066864, 0.53906661, 0.96099854, 0.65661621, -0.08105469, 103.74867249, 103.1112442 , 34.681427 ),
(datetime.datetime(2015, 9, 2, 23, 59, 59, 833000, tzinfo=datetime.timezone.utc), 47.93851089, 20.36968994, 113.9302597 , 0.53700191, 0.96630859, 0.63171387, -0.06982422, 103.79826355, 103.16053009, 34.67597961)],
dtype=[('timestamp', 'O'), ('conductivity_00', '<f8'), ('temperature_00', '<f8'), ('pressure_00', '<f8'), ('oxygensaturation_00', '<f8'), ('chlorophyll_00', '<f8'), ('cdom_00', '<f8'), ('turbidity_00', '<f8'), ('seapressure_00', '<f8'), ('depth_00', '<f8'), ('salinity_00', '<f8')])
The values given are expected to be datetime
objects.
Sample data is intended to be easily explorable through the use of named fields:
>>> # The average of the first 6,000 temperature values:
>>> temperatures = [row.temperature_00 for row in itertools.islice(rsk.samples(), 6_000)]
>>> sum(temperatures) / len(temperatures)
26.74410189819336
>>> # All salinity values from the start of the dataset to a cutoff date:
>>> rsk.npsamples(end_time=datetime(2015, 8, 29, 13, 0, tzinfo=datetime.timezone.utc))['salinity_00']
array([ 30.4004364 , 30.40065575, 30.4004097 , ..., 32.32573318,
32.32009506, 32.31411743])
Geographic¶
The iOS and Android apps collect GPS geodata,
which is accessible via the geodata
function:
>>> rsk.geodata()
<generator object RSK.geodata at 0x1245cdf48>
>>> import itertools
>>> for geo in itertools.islice(rsk.geodata(), 3):
... geo
...
Geo(timestamp=datetime.datetime(2019, 9, 27, 16, 24, 45, 145000, tzinfo=datetime.timezone.utc), latitude=48.66791163945951, longitude=-123.38619064505487, accuracy=7.074523989379777, accuracyType='HoriPhone')
Geo(timestamp=datetime.datetime(2019, 9, 27, 16, 24, 46, 145000, tzinfo=datetime.timezone.utc), latitude=48.667910759559035, longitude=-123.38618598100749, accuracy=6.162333509464028, accuracyType='HoriPhone')
Geo(timestamp=datetime.datetime(2019, 9, 27, 16, 24, 47, 145000, tzinfo=datetime.timezone.utc), latitude=48.66790415214656, longitude=-123.38618235385722, accuracy=5.5269390333275465, accuracyType='HoriPhone')
Regions¶
If you deployed your instrument with cast detection enabled, you can easily work with the data from each profile or cast.
The profiles
function
provides a generator
to access all profiles:
>>> rsk.profiles()
<generator object RSK._query_regions at 0x10741bf10>
>>> next(rsk.profiles()) # Grab the first profile.
Region(start_time=datetime.datetime(2015, 8, 29, 8, 28, 9, 333000, tzinfo=datetime.timezone.utc), end_time=datetime.datetime(2015, 8, 29, 8, 35, 14, 667000, tzinfo=datetime.timezone.utc), label='', description=None)
And directional casts can be accessed by direction:
>>> next(rsk.casts(pyrsktools.Region.CAST_DOWN)) # Downcasts...
Region(start_time=datetime.datetime(2015, 8, 29, 8, 28, 9, 333000, tzinfo=datetime.timezone.utc), end_time=datetime.datetime(2015, 8, 29, 8, 31, 45, 333000, tzinfo=datetime.timezone.utc), label='', description=None)
>>> next(rsk.casts(pyrsktools.Region.CAST_UP)) # ...and upcasts.
Region(start_time=datetime.datetime(2015, 8, 29, 8, 31, 45, 333000, tzinfo=datetime.timezone.utc), end_time=datetime.datetime(2015, 8, 29, 8, 35, 14, 667000, tzinfo=datetime.timezone.utc), label='', description=None)
The Region
object
returned by both of these methods
provides access to the limited range of samples
pertinent to the region in time during which
the profile or cast occurred:
>>> cast = next(rsk.casts(pyrsktools.Region.CAST_UP))
>>> cast.samples()
<generator object RSK.samples at 0x10741bf10>
>>> cast.npsamples()
array([ (datetime.datetime(2015, 8, 29, 8, 31, 45, 333000, tzinfo=datetime.timezone.utc), 47.28219604, 19.76168823, 115.52977753, 1.56193031e-02, 0.48699951, 0.51086426, -0.17822266, 105.39778137, 104.75022125, 34.63968658),
(datetime.datetime(2015, 8, 29, 8, 31, 45, 500000, tzinfo=datetime.timezone.utc), 47.18008041, 19.68182373, 115.39983368, 7.59091694e-03, 0.49487305, 0.515625 , -0.17773438, 105.26783752, 104.62107849, 34.62173462),
(datetime.datetime(2015, 8, 29, 8, 31, 45, 667000, tzinfo=datetime.timezone.utc), 47.1333313 , 19.57336426, 115.24497223, 1.43164247e-02, 0.48565674, 0.52685547, -0.18701172, 105.11297607, 104.46716309, 34.67309189),
...,
(datetime.datetime(2015, 8, 29, 8, 35, 14, 167000, tzinfo=datetime.timezone.utc), 50.45388412, 28.91912842, 11.95853233, 1.04415199e+02, 0.35913086, 0.32739258, -0.12060547, 1.82653236, 1.81531024, 30.39524841),
(datetime.datetime(2015, 8, 29, 8, 35, 14, 333000, tzinfo=datetime.timezone.utc), 50.45434189, 28.91921997, 11.91053104, 1.04440277e+02, 0.37084961, 0.29492188, -0.11181641, 1.77853107, 1.76760387, 30.39551353),
(datetime.datetime(2015, 8, 29, 8, 35, 14, 500000, tzinfo=datetime.timezone.utc), 50.45206451, 28.91882324, 11.87156296, 1.04411316e+02, 0.36724854, 0.30310059, -0.11621094, 1.73956299, 1.72887516, 30.39424133)],
dtype=[('timestamp', 'O'), ('conductivity_00', '<f8'), ('temperature_00', '<f8'), ('pressure_00', '<f8'), ('oxygensaturation_00', '<f8'), ('chlorophyll_00', '<f8'), ('cdom_00', '<f8'), ('turbidity_00', '<f8'), ('seapressure_00', '<f8'), ('depth_00', '<f8'), ('salinity_00', '<f8')])
Useful Things¶
Plotting¶
All of these examples depend on NumPy and Matplotlib:
>>> import numpy as np
>>> import matplotlib.pyplot as plt
and are run against the first upcast found in our dataset:
>>> samples = next(rsk.casts(pyrsktools.Region.CAST_UP)).npsamples()
Time-Series¶
>>> plt.title('Time-Series Example')
>>> plt.xlabel('Time')
>>> plt.ylabel(rsk.channels['temperature_00'].label())
>>> plt.plot(samples['timestamp'], samples['temperature_00'])
>>> plt.savefig('timeseries.svg')
Depth Plot¶
>>> plt.title('Depth Plot Example')
>>> plt.xlabel(rsk.channels['salinity_00'].label())
>>> plt.ylabel(rsk.channels['depth_00'].label())
>>> plt.gca().invert_yaxis()
>>> plt.plot(samples['salinity_00'], samples['depth_00'])
>>> plt.savefig('depthplot.svg')