Time data ASCII

This example will show how to read time data from an ASCII file using the default ASCII data reader. To do this, a metadata file is required. The example shows how an appropriate metadata file can be created and the information required to create such a metadata file.

The dataset in this example has been provided for use by the SAMTEX consortium. For more information, please refer to [Jones2009]. Additional details about the dataset can be found at https://www.mtnet.info/data/kap03/kap03.html.

The dataset is KAP175. A couple of notes:

  • The data has a sample every 5 seconds, meaning a 0.2 Hz sampling frequency.

  • Values of 1E32 have been replaced by NaN

from pathlib import Path
import plotly
import pandas as pd
from resistics.time import ChanMetadata, TimeMetadata, TimeReaderAscii, InterpolateNans

Define the data path. This is dependent on where the data is stored. Here, the data path is being read from an environment variable.

time_data_path = Path("..", "..", "data", "time", "ascii")
ascii_data_path = time_data_path / "kap175as.ts"

The folder contains a single ascii data file. Let’s have a look at the contents of the file.

with ascii_data_path.open("r") as f:
    for line_number, line in enumerate(f):
        print(line.strip("\n"))
        if line_number >= 130:
            break

Out:

# time series file from tssplice
# date: Mon Nov  7 05:44:13 2016
#
# Files spliced together:
# kap175a1  2003-10-31 11:00:00-2003-11-06 15:17:39
# kap175b1  2003-11-06 16:00:00-2003-11-15 09:56:39
#
# Following comment block from first file...
#
# time series file from mp2ts
# date: Mon Nov  7 05:44:07 2016
#
# input file: kap175\kap175a1.1mp
#
# Machine endian: Little
# UNIX set      : F
#
# site description: maroi
#
# Latitude   :022:11:30 S
# Longitude  :029:51:31 E
#
# LiMS acquisition code : 10.2
# LiMS box     number   :   53
# Magnetometer number   :   53
#
# Ex line length (m):     100.00
# Ey line length (m):      94.00
#
# Azimuths relative to: MAGNETIC NORTH
# Ex azimuth;  30
# Ey azimuth; 120
# Hx azimuth;  30
# Hy azimuth; 120
#
# FIRST 20 POINTS DROPPED FROM .1mp FILE TO
# ACCOUNT FOR FILTER SETTLING
#
#F Filter block begin
#F
#F Filters applied to LiMS/LRMT data are:
#F 1: Analogue anti-alias six-pole Bessel low-pass
#F    filters on each channel with -3 dB point at nominally 5 Hz.
#F    -calibrated values given below
#F
#F 2: Digital anti-alias multi-stage Chebyshev FIR filters
#F    with final stage at 2xsampling rate
#F
#F 1: Analogue single-pole Butterworth high-pass filters on the
#F    telluric channels only with -3 dB point at nominally 30,000 s
#F    -calibrated values given below
#F
#F Chan     Calib    Low-pass   High-pass (s)
#F  1        1.00        0.00        0.00
#F  2        1.00        0.00        0.00
#F  3        1.00        0.00        0.00
#F  4        1.00        0.00        0.00
#F  5        1.00        0.00        0.00
#F
#F In the tsrestack code, these filter responses are
#F removed using bessel7.f and high17.f
#F
#F Filter block end
>INFO_START:
>STATION   :kap175
>INSTRUMENT: 53
>WINDOW    :kap175as
>LATITUDE  : -22.1916695
>LONGITUDE :  29.8586102
>ELEVATION :  0.
>UTM_ORIGIN:  27.
>UTM_NORTH : -2456678
>UTM_EAST  : 794763
>COORD_SYS :MAGNETIC NORTH
>DECLIN    :  0.
>FORM      :ASCII
>FORMAT    :FREE
>SEQ_REC   : 1
>NCHAN     : 5
>CHAN_1    :HX
>SENSOR_1  : 53
>AZIM_1    :  30.
>UNITS_1   :nT
>GAIN_1    :  1.
>BASELINE_1:  12618.2402
>CHAN_2    :HY
>SENSOR_2  : 53
>AZIM_2    :  120.
>UNITS_2   :nT
>GAIN_2    :  1.
>BASELINE_2: -7354.87988
>CHAN_3    :HZ
>SENSOR_3  : 53
>AZIM_3    :  0.
>UNITS_3   :nT
>GAIN_3    :  1.
>BASELINE_3: -26291.1992
>CHAN_4    :EX
>SENSOR_4  : 53
>AZIM_4    :  30.
>UNITS_4   :mV/km
>GAIN_4    :  1.
>CHAN_5    :EY
>SENSOR_5  : 53
>AZIM_5    :  120.
>UNITS_5   :mV/km
>GAIN_5    :  1.
>STARTTIME :2003-10-31 11:00:00
>ENDTIME   :2003-11-15 09:56:39
>T_UNITS   :s
>DELTA_T   :  5.
>MIS_DATA  :  1.00000003E+32
>INFO_END  :
  2.39398551  1.43499565  2.21125007 -1.55760086  0.0748437345
  2.23659754  1.09759927  2.16549993 -6.5316  1.9800632
  1.6032145  0.608650982  2.02824998 -14.0248184  3.94819808
  0.724482358 -0.00434030406  1.79949999 -22.1152382  4.54121494
 -0.170995399 -0.679827273  1.54025006 -28.7814693  4.58896542
 -1.17621446 -1.34823668  1.28100002 -33.5379982  5.47701597
 -2.31609321 -1.93590927  0.991250038 -37.1378212  6.52709579
 -3.41281223 -2.44583607  0.701499999 -38.6588211  6.6605401
 -4.29263926 -2.81293082  0.442250013 -37.8415413  6.49546909
 -4.97082424 -3.01078033  0.244000003 -35.6481323  6.60037565
 -5.44532394 -3.07342243  0.106749997 -31.5662174  6.48429155
 -5.67856073 -2.94394422  0.0305000003 -26.1866817  6.25566292
 -5.76094007 -2.70975876  0.0152500002 -22.297369  6.53417683
 -5.86769009 -2.52486253  0.0152500002 -20.8914051  7.27097702
 -6.06688833 -2.39334106  0. -20.4016094  7.70362616
 -6.25846148 -2.27502656 -0.0305000003 -20.194458  7.71082592
 -6.485569 -2.24766445 -0.0305000003 -22.052597  7.81821918
 -6.84828472 -2.35142326 -0.0457499996 -26.1871376  8.14745235

Note that the metadata requires the number of samples. Pandas can be useful for this purpose.

df = pd.read_csv(ascii_data_path, header=None, skiprows=121, delim_whitespace=True)
n_samples = len(df.index)
print(df)

Out:

                 0           1           2          3         4
0        -4.292639   -2.812931    0.442250 -37.841541  6.495469
1        -4.970824   -3.010780    0.244000 -35.648132  6.600376
2        -5.445324   -3.073422    0.106750 -31.566217  6.484292
3        -5.678561   -2.943944    0.030500 -26.186682  6.255663
4        -5.760940   -2.709759    0.015250 -22.297369  6.534177
...            ...         ...         ...        ...       ...
258428 -162.422211 -738.918762 -504.817841  10.141210 -3.474090
258429 -162.422211 -738.918762 -504.817841   9.408063 -3.485243
258430 -162.422211 -738.918762 -504.817841   8.700190 -3.631670
258431 -162.422211 -738.918762 -504.817841   8.757909 -3.823143
258432 -162.422211 -738.918762 -504.817841   8.475470 -4.004943

[258433 rows x 5 columns]

Define other key pieces of recording information

fs = 0.2
chans = ["Hx", "Hy", "Hz", "Ex", "Ey"]
first_time = pd.Timestamp("2003-10-31 11:00:00")
last_time = first_time + (n_samples - 1) * pd.Timedelta(1 / fs, "s")

The next step is to create a TimeMetadata object. The TimeMetdata has information about the recording and channels. Let’s construct the TimeMetadata and save it as a JSON along with the time series data file.

chans_metadata = {}
for chan in chans:
    chan_type = "electric" if chan in ["Ex", "Ey"] else "magnetic"
    chans_metadata[chan] = ChanMetadata(
        name=chan, chan_type=chan_type, data_files=[ascii_data_path.name]
    )
time_metadata = TimeMetadata(
    fs=fs,
    chans=chans,
    n_samples=n_samples,
    first_time=first_time,
    last_time=last_time,
    chans_metadata=chans_metadata,
)
time_metadata.summary()
time_metadata.write(time_data_path / "metadata.json")

Out:

{
    'file_info': None,
    'fs': 0.2,
    'chans': ['Hx', 'Hy', 'Hz', 'Ex', 'Ey'],
    'n_chans': 5,
    'n_samples': 258433,
    'first_time': '2003-10-31 11:00:00.000000_000000_000000_000000',
    'last_time': '2003-11-15 09:56:00.000000_000000_000000_000000',
    'system': '',
    'serial': '',
    'wgs84_latitude': -999.0,
    'wgs84_longitude': -999.0,
    'easting': -999.0,
    'northing': -999.0,
    'elevation': -999.0,
    'chans_metadata': {
        'Hx': {
            'name': 'Hx',
            'data_files': ['kap175as.ts'],
            'chan_type': 'magnetic',
            'chan_source': None,
            'sensor': '',
            'serial': '',
            'gain1': 1,
            'gain2': 1,
            'scaling': 1,
            'chopper': False,
            'dipole_dist': 1,
            'sensor_calibration_file': '',
            'instrument_calibration_file': ''
        },
        'Hy': {
            'name': 'Hy',
            'data_files': ['kap175as.ts'],
            'chan_type': 'magnetic',
            'chan_source': None,
            'sensor': '',
            'serial': '',
            'gain1': 1,
            'gain2': 1,
            'scaling': 1,
            'chopper': False,
            'dipole_dist': 1,
            'sensor_calibration_file': '',
            'instrument_calibration_file': ''
        },
        'Hz': {
            'name': 'Hz',
            'data_files': ['kap175as.ts'],
            'chan_type': 'magnetic',
            'chan_source': None,
            'sensor': '',
            'serial': '',
            'gain1': 1,
            'gain2': 1,
            'scaling': 1,
            'chopper': False,
            'dipole_dist': 1,
            'sensor_calibration_file': '',
            'instrument_calibration_file': ''
        },
        'Ex': {
            'name': 'Ex',
            'data_files': ['kap175as.ts'],
            'chan_type': 'electric',
            'chan_source': None,
            'sensor': '',
            'serial': '',
            'gain1': 1,
            'gain2': 1,
            'scaling': 1,
            'chopper': False,
            'dipole_dist': 1,
            'sensor_calibration_file': '',
            'instrument_calibration_file': ''
        },
        'Ey': {
            'name': 'Ey',
            'data_files': ['kap175as.ts'],
            'chan_type': 'electric',
            'chan_source': None,
            'sensor': '',
            'serial': '',
            'gain1': 1,
            'gain2': 1,
            'scaling': 1,
            'chopper': False,
            'dipole_dist': 1,
            'sensor_calibration_file': '',
            'instrument_calibration_file': ''
        }
    },
    'history': {'records': []}
}

Now the data is ready to be read in by resistics. Read it in and print the first and last sample values.

reader = TimeReaderAscii(extension=".ts", n_header=121)
time_data = reader.run(time_data_path)
print(time_data.data[:, 0])
print(time_data.data[:, -1])

Out:

[ -4.2926393  -2.8129308   0.44225   -37.84154     6.495469 ]
[-162.42221   -738.91876   -504.81784      8.47547     -4.0049434]

There are some invalid values in the data that have been replaced by NaN values. Interpolate the NaN values.

time_data = InterpolateNans().run(time_data)

Finally plot the data. By default, the data is downsampled using the LTTB algorithm to avoid slow and large plots.

fig = time_data.plot(max_pts=1_000)
fig.update_layout(height=700)
plotly.io.show(fig)

Total running time of the script: ( 0 minutes 5.267 seconds)

Gallery generated by Sphinx-Gallery