Handling time-series data¶
Before processing any data, we need to handle all data we have: select specific subsets of these data and gather them. In practice, time data come with multiple sampling rates and time intervals. Data of one channel can be split in multiple files.
Razorback provides tools to help in the task of isolating and grouping the data of interest. See the signalset module for the detailled documentation.
Setting up¶
We first import razorback:
[1]:
import razorback as rb
Usualy, data are read from files, in which case the io module is here to help. But for this tutorial, we will work with fake signals. The following (generator) function will build them:
[2]:
import numpy as np
def build_fake_signals(infos):
""" yield some fake signal sets
Each signal set contains 5 channels (Ex, Ey, Hx, Hy, Hz)
for one run at one site, at a given sampling rate and given time interval.
The raw data of the signals are 'empty' arrays.
"""
tags_tpl = {'Ex_%s': 0, 'Ey_%s': 1, 'Hx_%s': 2, 'Hy_%s': 3, 'Hz_%s': 4,
'E_%s': (0, 1), 'H_%s': (2, 3, 4), 'site_%s': (0, 1, 2, 3, 4)}
for site_id, rate, start, stop in infos:
tags = {k % site_id: v for k, v in tags_tpl.items()}
size = int((stop-start) * rate) + 1
raw_data = np.empty((5, size))
signals = rb.SyncSignal(raw_data, rate, start)
yield rb.SignalSet(tags, signals)
We consider a toy situation where we have recorded MT signals (Ex, Ey, Hx, Hy, Hz) on 5 sites (1, 2, 3, 4, 5), some with several runs and different sampling rates. The situation could be pictured by:
==== ======================================
site time
==== ======================================
1 ~~~~~~~~
2 ~~~~~~ ~~~~~~ ^^^^^^^
3 ~~~~~~~~~~~~~~~~~ ^^^^^^^^^^
4 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
==== ======================================
~~~~~ : continuous run sampled at 512 Hz
^^^^^ : continuous run sampled at 1024 Hz
We use our build_fake_signals()
function to generate the signals and store the 8 signals in the all_signals
list:
[3]:
infos = [
# (site, sampling_rate, start_time, end_time)
( 1, 512, 110, 250),
( 2, 512, 0, 100),
( 2, 512, 120, 220),
( 2, 1024, 400, 450),
( 3, 512, 10, 300),
( 3, 1024, 350, 500),
( 4, 512, 5, 530),
( 5, 1024, 0, 550),
]
all_signals = list(build_fake_signals(infos))
We pick the four first signals that correspond to site 1 and 2 for further investigations:
[4]:
signal_1, signal_2a, signal_2b, signal_2c = all_signals[:4]
Inspecting a signal set¶
We can print signal_1 to get a summary of what we know about it:
[5]:
print(signal_1)
SignalSet: 5 channels, 1 run
tags: {'Ex_1': (0,), 'Ey_1': (1,), 'Hx_1': (2,),
'Hy_1': (3,), 'Hz_1': (4,), 'E_1': (0, 1),
'H_1': (2, 3, 4), 'site_1': (0, 1, 2, 3, 4)}
---------- ------------------- -------------------
sampling start stop
512 1970-01-01 00:01:50 1970-01-01 00:04:10
---------- ------------------- -------------------
This is a SignalSet object that contains the records of 5 channels on 1 run. Next we see the tags as a dictionnary, then some data about the run.
The tags are stored in a Tags object accessible by:
[6]:
signal_1.tags
[6]:
Tags(5, Ex_1=(0,), Ey_1=(1,), Hx_1=(2,), Hy_1=(3,), Hz_1=(4,), E_1=(0, 1), H_1=(2, 3, 4), site_1=(0, 1, 2, 3, 4))
The tags allows accessing to specific channels by using name identifiers. For instance, ‘Ex_1’ and ‘Ey_1’ correspond to indices 0 and 1. The tags also exposes names for groups of indices, like ‘E_1’ here that corresponds to the group (0, 1) i.e. the both channels ‘Ex_1’ and ‘Ey_1’.
Then we see the run: its sampling rate, its start time and its stop time. The run itself is accessible by:
[7]:
signal_1.signals
[7]:
(SyncSignal([5x71681], sampling_rate=5.1e+02, start=1.1e+02, calibrations=[...]),)
Actualy, we get a tuple because a SignalSet
object can handle multiple runs. A more detailled view is given by:
[8]:
print(signal_1.signals[0])
SyncSignal
- nb of channels : 5
- signal size : 71681
- sampling rate : 512.0 Hz
- start : 1970-01-01 00:01:50
- stop : 1970-01-01 00:04:10
This is a SyncSignal object that contains 5 channels, sampled at 512 Hz. SyncSignal
objects are a light wrapper around the raw data of a bunch of synchronous signals. Here, synchronous means that all signals start and stop at the same time and have the same sampling rate. Note that the channels of a SyncSignal
object are only identified by their indices not by tags.
We can access to all the informations of a signal set through its attribute:
[9]:
signal_1.nb_channels
[9]:
5
[10]:
signal_1.nb_runs
[10]:
1
[11]:
signal_1.sampling_rates
[11]:
array([512.])
[12]:
signal_1.intervals
[12]:
array([[110., 250.]])
[13]:
signal_1.sizes
[13]:
array([71681])
Joining successive runs¶
signal_2a
, signal_2b
and signal_2c
are records of 3 different runs on the site 2.
[14]:
print(signal_2a)
SignalSet: 5 channels, 1 run
tags: {'Ex_2': (0,), 'Ey_2': (1,), 'Hx_2': (2,),
'Hy_2': (3,), 'Hz_2': (4,), 'E_2': (0, 1),
'H_2': (2, 3, 4), 'site_2': (0, 1, 2, 3, 4)}
---------- ------------------- -------------------
sampling start stop
512 1970-01-01 00:00:00 1970-01-01 00:01:40
---------- ------------------- -------------------
[15]:
print(signal_2b)
SignalSet: 5 channels, 1 run
tags: {'Ex_2': (0,), 'Ey_2': (1,), 'Hx_2': (2,),
'Hy_2': (3,), 'Hz_2': (4,), 'E_2': (0, 1),
'H_2': (2, 3, 4), 'site_2': (0, 1, 2, 3, 4)}
---------- ------------------- -------------------
sampling start stop
512 1970-01-01 00:02:00 1970-01-01 00:03:40
---------- ------------------- -------------------
[16]:
print(signal_2c)
SignalSet: 5 channels, 1 run
tags: {'Ex_2': (0,), 'Ey_2': (1,), 'Hx_2': (2,),
'Hy_2': (3,), 'Hz_2': (4,), 'E_2': (0, 1),
'H_2': (2, 3, 4), 'site_2': (0, 1, 2, 3, 4)}
---------- ------------------- -------------------
sampling start stop
1024 1970-01-01 00:06:40 1970-01-01 00:07:30
---------- ------------------- -------------------
We see that they all have the same tags and that the 3 time intervals don’t overlap. For these reasons, we can join them (see SignalSet.join()) into one signal set, signal_2
:
[17]:
signal_2 = signal_2a | signal_2b | signal_2c
When we print signal_2
, we see that now the 3 runs of site 2 are gathered into one SignalSet
object:
[18]:
print(signal_2)
SignalSet: 5 channels, 3 runs
tags: {'Ex_2': (0,), 'Ey_2': (1,), 'Hx_2': (2,),
'Hy_2': (3,), 'Hz_2': (4,), 'E_2': (0, 1),
'H_2': (2, 3, 4), 'site_2': (0, 1, 2, 3, 4)}
---------- ------------------- -------------------
sampling start stop
512 1970-01-01 00:00:00 1970-01-01 00:01:40
512 1970-01-01 00:02:00 1970-01-01 00:03:40
1024 1970-01-01 00:06:40 1970-01-01 00:07:30
---------- ------------------- -------------------
And now attributes relatives to the runs give the values for all runs:
[19]:
signal_2.sampling_rates
[19]:
array([ 512., 512., 1024.])
[20]:
signal_2.intervals
[20]:
array([[ 0., 100.],
[120., 220.],
[400., 450.]])
[21]:
signal_2.starts
[21]:
array([ 0., 120., 400.])
Note that these values are passed as numpy array, allowing for a lot of operations on them:
[22]:
signal_2.sampling_rates == 1024
[22]:
array([False, False, True])
Extracting time intervals¶
The SignalSet.extract_t() method can narrow the time interval of a signal set. The runs outside the interval are skipped and the others are eventually reduced to fit the interval:
[23]:
print(signal_2.extract_t(50, 200))
SignalSet: 5 channels, 2 runs
tags: {'Ex_2': (0,), 'Ey_2': (1,), 'Hx_2': (2,),
'Hy_2': (3,), 'Hz_2': (4,), 'E_2': (0, 1),
'H_2': (2, 3, 4), 'site_2': (0, 1, 2, 3, 4)}
---------- ------------------- -------------------
sampling start stop
512 1970-01-01 00:00:50 1970-01-01 00:01:40
512 1970-01-01 00:02:00 1970-01-01 00:03:20
---------- ------------------- -------------------
The SignalSet.extract_t()
method can also be used to exclude a given time interval:
[24]:
print(signal_2.extract_t(50, 200, exclude=True))
SignalSet: 5 channels, 3 runs
tags: {'Ex_2': (0,), 'Ey_2': (1,), 'Hx_2': (2,),
'Hy_2': (3,), 'Hz_2': (4,), 'E_2': (0, 1),
'H_2': (2, 3, 4), 'site_2': (0, 1, 2, 3, 4)}
---------- ------------------- -------------------
sampling start stop
512 1970-01-01 00:00:00 1970-01-01 00:00:50
512 1970-01-01 00:03:20 1970-01-01 00:03:40
1024 1970-01-01 00:06:40 1970-01-01 00:07:30
---------- ------------------- -------------------
Splitting and merging channels¶
We can extract the Electric field only from the Signal Set
[25]:
print(signal_2.E_2)
SignalSet: 2 channels, 3 runs
tags: {'Ex_2': (0,), 'Ey_2': (1,), 'E_2': (0, 1)}
---------- ------------------- -------------------
sampling start stop
512 1970-01-01 00:00:00 1970-01-01 00:01:40
512 1970-01-01 00:02:00 1970-01-01 00:03:40
1024 1970-01-01 00:06:40 1970-01-01 00:07:30
---------- ------------------- -------------------
We can Extracting only one component of E and H and joining them in a SignalSet
[26]:
print(signal_2.Ex_2 & signal_2.Hy_2)
SignalSet: 2 channels, 3 runs
tags: {'Ex_2': (0,), 'Hy_2': (1,)}
---------- ------------------- -------------------
sampling start stop
512 1970-01-01 00:00:00 1970-01-01 00:01:40
512 1970-01-01 00:02:00 1970-01-01 00:03:40
1024 1970-01-01 00:06:40 1970-01-01 00:07:30
---------- ------------------- -------------------
[27]:
print(signal_1 & signal_2) # Merging two Signal Sets --> keeping the synchronous part of signal_1 and signal_2
SignalSet: 10 channels, 1 run
tags: {'Ex_1': (0,), 'Ex_2': (5,), 'Ey_1': (1,),
'Ey_2': (6,), 'Hx_1': (2,), 'Hx_2': (7,),
'Hy_1': (3,), 'Hy_2': (8,), 'Hz_1': (4,),
'Hz_2': (9,), 'E_1': (0, 1), 'E_2': (5, 6),
'H_1': (2, 3, 4), 'H_2': (7, 8, 9),
'site_1': (0, 1, 2, 3, 4),
'site_2': (5, 6, 7, 8, 9)}
---------- ------------------- -------------------
sampling start stop
512 1970-01-01 00:02:00 1970-01-01 00:03:40
---------- ------------------- -------------------
Selecting channels and runs¶
We can define a mask in order to locate 512 Hz sampling data only
[28]:
mask = signal_2.sampling_rates == 512
mask
[28]:
array([ True, True, False])
We select the 512 Hz run by using the appropriate mask
[29]:
print(signal_2.select_runs(mask))
SignalSet: 5 channels, 2 runs
tags: {'Ex_2': (0,), 'Ey_2': (1,), 'Hx_2': (2,),
'Hy_2': (3,), 'Hz_2': (4,), 'E_2': (0, 1),
'H_2': (2, 3, 4), 'site_2': (0, 1, 2, 3, 4)}
---------- ------------------- -------------------
sampling start stop
512 1970-01-01 00:00:00 1970-01-01 00:01:40
512 1970-01-01 00:02:00 1970-01-01 00:03:40
---------- ------------------- -------------------
You can select all the other data not contained in the mask by using ~mask
[30]:
print(signal_2.select_runs(~mask))
SignalSet: 5 channels, 1 run
tags: {'Ex_2': (0,), 'Ey_2': (1,), 'Hx_2': (2,),
'Hy_2': (3,), 'Hz_2': (4,), 'E_2': (0, 1),
'H_2': (2, 3, 4), 'site_2': (0, 1, 2, 3, 4)}
---------- ------------------- -------------------
sampling start stop
1024 1970-01-01 00:06:40 1970-01-01 00:07:30
---------- ------------------- -------------------
Store, search & combine: the inventory¶
You can create an Inventory Object from all_signals
[31]:
inventory = rb.Inventory(all_signals)
One can print all the tags of the Inventory:
[32]:
inventory.tags
[32]:
{'E_1',
'E_2',
'E_3',
'E_4',
'E_5',
'Ex_1',
'Ex_2',
'Ex_3',
'Ex_4',
'Ex_5',
'Ey_1',
'Ey_2',
'Ey_3',
'Ey_4',
'Ey_5',
'H_1',
'H_2',
'H_3',
'H_4',
'H_5',
'Hx_1',
'Hx_2',
'Hx_3',
'Hx_4',
'Hx_5',
'Hy_1',
'Hy_2',
'Hy_3',
'Hy_4',
'Hy_5',
'Hz_1',
'Hz_2',
'Hz_3',
'Hz_4',
'Hz_5',
'site_1',
'site_2',
'site_3',
'site_4',
'site_5'}
We can select sub-inventory/SignalSet using the tag of the MT site site_3
and show its tags:
[33]:
sub_inventory = inventory.select_channels('site_3')
print(sub_inventory)
print()
print(sub_inventory.tags)
Inventory([SignalSet({'Ex_3': (0,), 'Ey_3': (1,), 'Hx_3': (2,), 'Hy_3': (3,), 'Hz_3': (4,), 'E_3': (0, 1), 'H_3': (2, 3, 4), 'site_3': (0, 1, 2, 3, 4)}, SyncSignal([5x148481], sampling_rate=5.1e+02, start=10, calibrations=[...])), SignalSet({'Ex_3': (0,), 'Ey_3': (1,), 'Hx_3': (2,), 'Hy_3': (3,), 'Hz_3': (4,), 'E_3': (0, 1), 'H_3': (2, 3, 4), 'site_3': (0, 1, 2, 3, 4)}, SyncSignal([5x153601], sampling_rate=1e+03, start=3.5e+02, calibrations=[...]))])
{'Ex_3', 'Ey_3', 'Hx_3', 'H_3', 'Hy_3', 'site_3', 'Hz_3', 'E_3'}
The pack instruction creates the largest (longest) signal set that contains sub_inventory.tags
[34]:
print(sub_inventory.pack())
SignalSet: 5 channels, 2 runs
tags: {'Ex_3': (0,), 'Ey_3': (1,), 'Hx_3': (2,),
'Hy_3': (3,), 'Hz_3': (4,), 'E_3': (0, 1),
'H_3': (2, 3, 4), 'site_3': (0, 1, 2, 3, 4)}
---------- ------------------- -------------------
sampling start stop
512 1970-01-01 00:00:10 1970-01-01 00:05:00
1024 1970-01-01 00:05:50 1970-01-01 00:08:20
---------- ------------------- -------------------
Here, first we create a sub-inventory using tags of the MT sites site_2
and site_3
, then we create a SignalSet object out of it (using pack()
):
[35]:
print(inventory.select_channels('site_2', 'site_3').pack())
SignalSet: 10 channels, 3 runs
tags: {'Ex_2': (0,), 'Ex_3': (5,), 'Ey_2': (1,),
'Ey_3': (6,), 'Hx_2': (2,), 'Hx_3': (7,),
'Hy_2': (3,), 'Hy_3': (8,), 'Hz_2': (4,),
'Hz_3': (9,), 'E_2': (0, 1), 'E_3': (5, 6),
'H_2': (2, 3, 4), 'H_3': (7, 8, 9),
'site_2': (0, 1, 2, 3, 4),
'site_3': (5, 6, 7, 8, 9)}
---------- ------------------- -------------------
sampling start stop
512 1970-01-01 00:00:10 1970-01-01 00:01:40
512 1970-01-01 00:02:00 1970-01-01 00:03:40
1024 1970-01-01 00:06:40 1970-01-01 00:07:30
---------- ------------------- -------------------
Same operation as above with site_3
and site_5
. Please note that both sites have no common synchronous recordings at 512Hz.
[36]:
print(inventory.select_channels('site_3', 'site_5').pack())
SignalSet: 10 channels, 1 run
tags: {'Ex_3': (0,), 'Ex_5': (5,), 'Ey_3': (1,),
'Ey_5': (6,), 'Hx_3': (2,), 'Hx_5': (7,),
'Hy_3': (3,), 'Hy_5': (8,), 'Hz_3': (4,),
'Hz_5': (9,), 'E_3': (0, 1), 'E_5': (5, 6),
'H_3': (2, 3, 4), 'H_5': (7, 8, 9),
'site_3': (0, 1, 2, 3, 4),
'site_5': (5, 6, 7, 8, 9)}
---------- ------------------- -------------------
sampling start stop
1024 1970-01-01 00:05:50 1970-01-01 00:08:20
---------- ------------------- -------------------
Same operation as above with site_2
and site_4
. Please note that both sites have ONLY common synchronous recordings at 512Hz.
[37]:
print(inventory.select_channels('site_2', 'site_4').pack())
SignalSet: 10 channels, 2 runs
tags: {'Ex_2': (0,), 'Ex_4': (5,), 'Ey_2': (1,),
'Ey_4': (6,), 'Hx_2': (2,), 'Hx_4': (7,),
'Hy_2': (3,), 'Hy_4': (8,), 'Hz_2': (4,),
'Hz_4': (9,), 'E_2': (0, 1), 'E_4': (5, 6),
'H_2': (2, 3, 4), 'H_4': (7, 8, 9),
'site_2': (0, 1, 2, 3, 4),
'site_4': (5, 6, 7, 8, 9)}
---------- ------------------- -------------------
sampling start stop
512 1970-01-01 00:00:05 1970-01-01 00:01:40
512 1970-01-01 00:02:00 1970-01-01 00:03:40
---------- ------------------- -------------------
When there is no synchronousness between the different SyncSignals
channels-, pack()
instruction returns None
. Consequently, the MT user needs to check the synchronoussness of the different channels.
[38]:
print(inventory.pack())
None
Same operation as above but a filter is applied in order to select electric fields only.
[39]:
print(inventory.select_channels('site_2', 'site_4').filter('E_*').pack())
SignalSet: 4 channels, 2 runs
tags: {'Ex_2': (0,), 'Ex_4': (2,), 'Ey_2': (1,),
'Ey_4': (3,), 'E_2': (0, 1), 'E_4': (2, 3)}
---------- ------------------- -------------------
sampling start stop
512 1970-01-01 00:00:05 1970-01-01 00:01:40
512 1970-01-01 00:02:00 1970-01-01 00:03:40
---------- ------------------- -------------------
[ ]:
[ ]:
[ ]: