Using¶
The Essentials¶
Opening a Dataset¶
The module includes a function to open an RSK file:
>>> import pyrsktools # Import the library
>>> rsk = pyrsktools.open('some_rsk.rsk') # Load up an RSK
This returns an RSK
object
against which all other library operations
are performed.
Once you’re finished with the dataset, it should be closed:
>>> rsk.close()
open
can also be used
with the with
statement:
>>> with pyrsktools.open('some_rsk.rsk') as rsk:
>>> # Do something with the RSK. It will be automatically closed
>>> # at the end of the block.
What’s Inside?¶
Metadata¶
The RSK provides some basic metadata about itself:
>>> rsk.name # What was the filename of the RSK?
'080281_20150911_1112.rsk'
the instrument that recorded it:
>>> rsk.instrument # What instrument was used?
Instrument(serial=80281, model='RBRmaestro', firmware_version='1.2', firmware_type=103)
>>> rsk.channels # What channels were present on the instrument?
OrderedDict([('conductivity_00', Channel(id=1, key='cond06', label='conductivity_00', name='Conductivity', units='mS/cm', derived=False)), ('temperature_00', Channel(id=2, key='temp09', label='temperature_00', name='Temperature', units='°C', derived=False)), ('pressure_00', Channel(id=3, key='pres19', label='pressure_00', name='Pressure', units='dbar', derived=False)), ('oxygensaturation_00', Channel(id=4, key='doxy09', label='oxygensaturation_00', name='Dissolved O₂', units='%', derived=False)), ('chlorophyll_00', Channel(id=5, key='fluo10', label='chlorophyll_00', name='Chlorophyll a', units='µg/l', derived=False)), ('cdom_00', Channel(id=6, key='fluo11', label='cdom_00', name='CDOM', units='ppb', derived=False)), ('turbidity_00', Channel(id=7, key='turb01', label='turbidity_00', name='Turbidity', units='NTU', derived=False)), ('seapressure_00', Channel(id=8, key='pres08', label='seapressure_00', name='Sea pressure', units='dbar', derived=True)), ('depth_00', Channel(id=9, key='dpth01', label='depth_00', name='Depth', units='m', derived=True)), ('salinity_00', Channel(id=10, key='sal_00', label='salinity_00', name='Salinity', units='PSU', derived=True))])
and the deployment:
>>> rsk.deployment
Deployment(id=1, comment='', logger_status=None, logger_time_drift=0, download_time=datetime.datetime(2015, 9, 11, 7, 12, 30, 905000, tzinfo=datetime.timezone.utc), name='080281_20150911_1112.rsk', sample_size=6711588)
Samples¶
But you probably care most about the sample data. Samples can be accessed in two ways. They can always be accessed iteratively, via a generator:
>>> rsk.samples()
<generator object RSK.samples at 0x10741bf10>
>>> import itertools
>>> for sample in itertools.islice(rsk.samples(), 3):
... sample
...
Sample(timestamp=datetime.datetime(2015, 8, 29, 8, 28, 9, 333000, tzinfo=datetime.timezone.utc), conductivity_00=50.468727111816406, temperature_00=28.92376708984375, pressure_00=19.332664489746094, oxygensaturation_00=103.3949203491211, chlorophyll_00=0.128173828125, cdom_00=0.0048828125, turbidity_00=1.0341796875, seapressure_00=9.200664520263672, depth_00=9.144135475158691, salinity_00=30.400436401367188)
Sample(timestamp=datetime.datetime(2015, 8, 29, 8, 28, 9, 500000, tzinfo=datetime.timezone.utc), conductivity_00=50.469181060791016, temperature_00=28.92388916015625, pressure_00=19.370471954345703, oxygensaturation_00=103.39202880859375, chlorophyll_00=0.1822509765625, cdom_00=0.082763671875, turbidity_00=1.0419921875, seapressure_00=9.238471984863281, depth_00=9.181711196899414, salinity_00=30.40065574645996)
Sample(timestamp=datetime.datetime(2015, 8, 29, 8, 28, 9, 667000, tzinfo=datetime.timezone.utc), conductivity_00=50.468833923339844, temperature_00=28.92388916015625, pressure_00=19.395383834838867, oxygensaturation_00=103.46443939208984, chlorophyll_00=0.24127197265625, cdom_00=0.116455078125, turbidity_00=1.0439453125, seapressure_00=9.263383865356445, depth_00=9.206469535827637, salinity_00=30.400409698486328)
Or if NumPy is available, they can be retrieved into an array:
>>> rsk.npsamples()
array([ (datetime.datetime(2015, 8, 29, 8, 28, 9, 333000, tzinfo=datetime.timezone.utc), 5.04687271e+01, 28.92376709, 19.33266449, 103.39492035, 0.12817383, 0.00488281, 1.03417969, 9.20066452, 9.14413548, 3.04004364e+01),
(datetime.datetime(2015, 8, 29, 8, 28, 9, 500000, tzinfo=datetime.timezone.utc), 5.04691811e+01, 28.92388916, 19.37047195, 103.39202881, 0.18225098, 0.08276367, 1.04199219, 9.23847198, 9.1817112 , 3.04006557e+01),
(datetime.datetime(2015, 8, 29, 8, 28, 9, 667000, tzinfo=datetime.timezone.utc), 5.04688339e+01, 28.92388916, 19.39538383, 103.46443939, 0.24127197, 0.11645508, 1.04394531, 9.26338387, 9.20646954, 3.04004097e+01),
...,
(datetime.datetime(2015, 9, 11, 7, 11, 26, 833000, tzinfo=datetime.timezone.utc), 5.70757780e-04, 21.03649902, 10.05975151, 105.14163208, -0.0090332 , -0.18981934, -0.04785156, -0.07224846, -0.07180457, 1.03646256e-02),
(datetime.datetime(2015, 9, 11, 7, 11, 27, tzinfo=datetime.timezone.utc), 9.17379744e-04, 21.03649902, 10.06026268, 105.09892273, -0.01196289, -0.18041992, -0.0390625 , -0.07173729, -0.07129654, 1.03617543e-02),
(datetime.datetime(2015, 9, 11, 7, 11, 27, 167000, tzinfo=datetime.timezone.utc), -3.77910328e-04, 21.03656006, 10.06090927, 105.13765717, -0.01171875, -0.17102051, -0.02880859, -0.0710907 , -0.07065392, 0.00000000e+00)],
dtype=[('timestamp', 'O'), ('conductivity_00', '<f8'), ('temperature_00', '<f8'), ('pressure_00', '<f8'), ('oxygensaturation_00', '<f8'), ('chlorophyll_00', '<f8'), ('cdom_00', '<f8'), ('turbidity_00', '<f8'), ('seapressure_00', '<f8'), ('depth_00', '<f8'), ('salinity_00', '<f8')])
Data returned from both the samples
and npsamples
functions
can be time-limited
via the start_time
and end_time
named arguments:
>>> from datetime import datetime, timezone
>>> rsk.npsamples(start_time=datetime(2015, 9, 2, tzinfo=timezone.utc),
... end_time=datetime(2015, 9, 3, tzinfo=timezone.utc))
array([ (datetime.datetime(2015, 9, 2, 0, 0, tzinfo=datetime.timezone.utc), 50.78747177, 23.58898926, 97.16591644, 2.78220224, 0.44580078, 0.55810547, -0.15380859, 87.03392029, 86.49918365, 34.35256195),
(datetime.datetime(2015, 9, 2, 0, 0, 0, 167000, tzinfo=datetime.timezone.utc), 50.81333542, 23.61236572, 97.32424164, 2.77061081, 0.44104004, 0.54382324, -0.15332031, 87.19224548, 86.65653992, 34.35404587),
(datetime.datetime(2015, 9, 2, 0, 0, 0, 333000, tzinfo=datetime.timezone.utc), 50.84253311, 23.61846924, 97.51081848, 2.74815583, 0.42736816, 0.5559082 , -0.15722656, 87.3788147 , 86.84196472, 34.3714447 ),
...,
(datetime.datetime(2015, 9, 2, 23, 59, 59, 500000, tzinfo=datetime.timezone.utc), 47.95243073, 20.36456299, 113.82951355, 0.53426731, 0.95812988, 0.65759277, -0.08203125, 103.69750977, 103.06039429, 34.69152832),
(datetime.datetime(2015, 9, 2, 23, 59, 59, 667000, tzinfo=datetime.timezone.utc), 47.94488144, 20.36938477, 113.88066864, 0.53906661, 0.96099854, 0.65661621, -0.08105469, 103.74867249, 103.1112442 , 34.681427 ),
(datetime.datetime(2015, 9, 2, 23, 59, 59, 833000, tzinfo=datetime.timezone.utc), 47.93851089, 20.36968994, 113.9302597 , 0.53700191, 0.96630859, 0.63171387, -0.06982422, 103.79826355, 103.16053009, 34.67597961)],
dtype=[('timestamp', 'O'), ('conductivity_00', '<f8'), ('temperature_00', '<f8'), ('pressure_00', '<f8'), ('oxygensaturation_00', '<f8'), ('chlorophyll_00', '<f8'), ('cdom_00', '<f8'), ('turbidity_00', '<f8'), ('seapressure_00', '<f8'), ('depth_00', '<f8'), ('salinity_00', '<f8')])
The values given are expected to be datetime
objects.
Sample data is intended to be easily explorable through the use of named fields:
>>> # The average of the first 6,000 temperature values:
>>> temperatures = [row.temperature_00 for row in itertools.islice(rsk.samples(), 6_000)]
>>> sum(temperatures) / len(temperatures)
26.74410189819336
>>> # All salinity values from the start of the dataset to a cutoff date:
>>> rsk.npsamples(end_time=datetime(2015, 8, 29, 13, 0, tzinfo=datetime.timezone.utc))['salinity_00']
array([ 30.4004364 , 30.40065575, 30.4004097 , ..., 32.32573318,
32.32009506, 32.31411743])
Geographic¶
The iOS and Android apps collect GPS geodata,
which is accessible via the geodata
function:
>>> rsk.geodata()
<generator object RSK.geodata at 0x1245cdf48>
>>> import itertools
>>> for geo in itertools.islice(rsk.geodata(), 3):
... geo
...
Geo(timestamp=datetime.datetime(2019, 9, 27, 16, 24, 45, 145000, tzinfo=datetime.timezone.utc), latitude=48.66791163945951, longitude=-123.38619064505487, accuracy=7.074523989379777, accuracyType='HoriPhone')
Geo(timestamp=datetime.datetime(2019, 9, 27, 16, 24, 46, 145000, tzinfo=datetime.timezone.utc), latitude=48.667910759559035, longitude=-123.38618598100749, accuracy=6.162333509464028, accuracyType='HoriPhone')
Geo(timestamp=datetime.datetime(2019, 9, 27, 16, 24, 47, 145000, tzinfo=datetime.timezone.utc), latitude=48.66790415214656, longitude=-123.38618235385722, accuracy=5.5269390333275465, accuracyType='HoriPhone')
Regions¶
If you deployed your instrument with cast detection enabled, you can easily work with the data from each profile or cast.
The profiles
function
provides a generator
to access all profiles:
>>> rsk.profiles()
<generator object RSK._query_regions at 0x10741bf10>
>>> next(rsk.profiles()) # Grab the first profile.
Region(start_time=datetime.datetime(2015, 8, 29, 8, 28, 9, 333000, tzinfo=datetime.timezone.utc), end_time=datetime.datetime(2015, 8, 29, 8, 35, 14, 667000, tzinfo=datetime.timezone.utc), label='', description=None)
And directional casts can be accessed by direction:
>>> next(rsk.casts(pyrsktools.Region.CAST_DOWN)) # Downcasts...
Region(start_time=datetime.datetime(2015, 8, 29, 8, 28, 9, 333000, tzinfo=datetime.timezone.utc), end_time=datetime.datetime(2015, 8, 29, 8, 31, 45, 333000, tzinfo=datetime.timezone.utc), label='', description=None)
>>> next(rsk.casts(pyrsktools.Region.CAST_UP)) # ...and upcasts.
Region(start_time=datetime.datetime(2015, 8, 29, 8, 31, 45, 333000, tzinfo=datetime.timezone.utc), end_time=datetime.datetime(2015, 8, 29, 8, 35, 14, 667000, tzinfo=datetime.timezone.utc), label='', description=None)
The Region
object
returned by both of these methods
provides access to the limited range of samples
pertinent to the region in time during which
the profile or cast occurred:
>>> cast = next(rsk.casts(pyrsktools.Region.CAST_UP))
>>> cast.samples()
<generator object RSK.samples at 0x10741bf10>
>>> cast.npsamples()
array([ (datetime.datetime(2015, 8, 29, 8, 31, 45, 333000, tzinfo=datetime.timezone.utc), 47.28219604, 19.76168823, 115.52977753, 1.56193031e-02, 0.48699951, 0.51086426, -0.17822266, 105.39778137, 104.75022125, 34.63968658),
(datetime.datetime(2015, 8, 29, 8, 31, 45, 500000, tzinfo=datetime.timezone.utc), 47.18008041, 19.68182373, 115.39983368, 7.59091694e-03, 0.49487305, 0.515625 , -0.17773438, 105.26783752, 104.62107849, 34.62173462),
(datetime.datetime(2015, 8, 29, 8, 31, 45, 667000, tzinfo=datetime.timezone.utc), 47.1333313 , 19.57336426, 115.24497223, 1.43164247e-02, 0.48565674, 0.52685547, -0.18701172, 105.11297607, 104.46716309, 34.67309189),
...,
(datetime.datetime(2015, 8, 29, 8, 35, 14, 167000, tzinfo=datetime.timezone.utc), 50.45388412, 28.91912842, 11.95853233, 1.04415199e+02, 0.35913086, 0.32739258, -0.12060547, 1.82653236, 1.81531024, 30.39524841),
(datetime.datetime(2015, 8, 29, 8, 35, 14, 333000, tzinfo=datetime.timezone.utc), 50.45434189, 28.91921997, 11.91053104, 1.04440277e+02, 0.37084961, 0.29492188, -0.11181641, 1.77853107, 1.76760387, 30.39551353),
(datetime.datetime(2015, 8, 29, 8, 35, 14, 500000, tzinfo=datetime.timezone.utc), 50.45206451, 28.91882324, 11.87156296, 1.04411316e+02, 0.36724854, 0.30310059, -0.11621094, 1.73956299, 1.72887516, 30.39424133)],
dtype=[('timestamp', 'O'), ('conductivity_00', '<f8'), ('temperature_00', '<f8'), ('pressure_00', '<f8'), ('oxygensaturation_00', '<f8'), ('chlorophyll_00', '<f8'), ('cdom_00', '<f8'), ('turbidity_00', '<f8'), ('seapressure_00', '<f8'), ('depth_00', '<f8'), ('salinity_00', '<f8')])
Useful Things¶
Plotting¶
All of these examples depend on NumPy and Matplotlib:
>>> import numpy as np
>>> import matplotlib.pyplot as plt
and are run against the first upcast found in our dataset:
>>> samples = next(rsk.casts(pyrsktools.Region.CAST_UP)).npsamples()
Time-Series¶
>>> plt.title('Time-Series Example')
>>> plt.xlabel('Time')
>>> plt.ylabel(rsk.channels['temperature_00'].label())
>>> plt.plot(samples['timestamp'], samples['temperature_00'])
>>> plt.savefig('timeseries.svg')
Depth Plot¶
>>> plt.title('Depth Plot Example')
>>> plt.xlabel(rsk.channels['salinity_00'].label())
>>> plt.ylabel(rsk.channels['depth_00'].label())
>>> plt.gca().invert_yaxis()
>>> plt.plot(samples['salinity_00'], samples['depth_00'])
>>> plt.savefig('depthplot.svg')