UVData
UVData objects hold all of the metadata and data required to analyze interferometric
data sets. Interferometric data is fundamentally tied to baselines, which are composed
of pairs of antennas. Visibilities, the measured quantity recorded from interferometers,
are complex numbers per baseline, time, frequency and instrumental polarization. On
UVData objects, visibilities are held in the data_array. The data_array has axes
corresponding to baseline-time, frequency and instrumental polarization, so the baselines
and times are indexed together. This is because it is not uncommon for interferometers
not to record every baseline at every time for several reasons (including
baseline-dependent averaging). Note that UVData can also support combining the frequency
and polarization axis, which can be useful in certain circumstances, objects represented
this way are called flex_pol objects and are more fully described in UVData: flex_pol objects.
Metadata that is associated with the telescope (as opposed to the data set) is stored in
a pyuvdata.Telescope object as the telescope attribute on a UVData object.
This includes metadata related to the telescope location, antenna names, numbers and
positions as well as other telescope metadata.
The antennas are described in two ways: with antenna numbers and antenna names. The
antenna numbers should not be confused with indices – they are not required to start
at zero or to be contiguous, although it is not uncommon for some telescopes to number
them like indices. On UVData objects, the names and numbers are held in the
telescope.antenna_names and telescope.antenna_numbers attributes
respectively. These are arranged in the same order so that an antenna number
can be used to identify an antenna name and vice versa.
Note that not all the antennas listed in telescope.antenna_numbers and
telescope.antenna_names are guaranteed to have visibilities associated with
them in the data_array.
For most users, the convenience methods for quick data access (see UVData: Quick data access)
are the easiest way to get data for particular sets of baselines. Those methods take
the antenna numbers (i.e. numbers listed in telescope.antenna_numbers) as inputs.
Users interested in indexing/manipulating the data array directly can find more
information below.
The antenna numbers associated with each visibility are held in the ant_1_array
and ant_2_array attributes. These arrays have the same length as the
data_array along the baseline-time axis, and which array the numbers appear
in (ant_1_array vs ant_2_array) indicates the direction of the baseline. On
UVData objects, the baseline vector is defined to point from antenna 1 to antenna 2, so
it is given by the position of antenna 2 minus the position of antenna 1. Since the
ant_1_array and ant_2_array attributes have the length of the baseline-time axis,
when there is more than one time integration in the data there are many repetitions of
each baseline. The times for each visibility are given by the time_array attribute
which also has the same length (the length of the baseline-time axis on the data_array).
There is also a baseline_array attribute with baseline numbers defined from the
ant_1_array and ant_2_array as
\(baseline = 2048 \times (antenna_1) + (antenna_2) + 2^{16}\).
This gives a unique baseline number for each antenna pair and can be a useful way to
identify visibilities associated with particular baselines. The baseline_array
attribute has the same length as the ant_1_array and ant_2_array (the length of
the baseline-time axis on the data_array).
Note
Our tutorial uses small data files for examples. The data files are hosted in
the the RASG datasets repo,
organized by data type and telescope. In the tutorials this data is downloaded
and cached using the pooch package via the pyuvdata.datasets.fetch_data
function. To run those commands you’ll need to have pooch installed (you can
install it yourself or use pip install pyuvdata[tutorial]). Note that pooch
will download the file the first time you ask for it and save it in a cache
folder, subsequent calls to fetch that data will not re-download it.
UVData: Instantiating a UVData object from a file (i.e. reading data)
Use the pyuvdata.UVData.from_file() to instantiate a UVData object from
data in a file (alternatively you can create an object with no inputs and then
call the pyuvdata.UVData.read() method). Most file types require a single
file or folder to instantiate an object, FHD and raw MWA correlator data sets
require the user to specify multiple files for each dataset.
pyuvdata can also be used to create a UVData object from arrays in memory
(see UVData: Instantiating from arrays in memory) and to read in multiple datasets (files) into a single object
(see d) Reading multiple files.).
Note
Reading or writing CASA Measurement sets requires python-casacore to be installed (see the readme for details). Note that the python-casacore and casatools/casatasks packages are incompatible – importing them both causes segfaults. Reading or writing Miriad files is not supported on Windows.
a) Instantiate an object from a single file or folder
UVFITS and uvh5 and datasets are stored in a single file. Miriad, CASA Measurement Sets and MIR datasets are stored in structured folders, for these file types pass in the folder name.
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename)
b) Instantiate an object from an FHD dataset
When reading FHD datasets, we need to pass in several auxilliary files.
import os
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
# Set up the files we need
fhd_prefix = '1061316296_'
fhd_path = fetch_data("mwa_fhd")
fhd_vis_files = [os.path.join(fhd_path, "vis_data", fhd_prefix + f) for f in ["vis_XX.sav", "vis_YY.sav"]]
flags_file = os.path.join(fhd_path, "vis_data", fhd_prefix + "flags.sav")
layout_file = os.path.join(fhd_path, "metadata", fhd_prefix + "layout.sav")
params_file = os.path.join(fhd_path, "metadata", fhd_prefix + "params.sav")
settings_file = os.path.join(fhd_path, "metadata", fhd_prefix + "settings.txt")
uvd = UVData.from_file(
fhd_vis_files,
flags_file=flags_file,
layout_file=layout_file,
params_file=params_file,
settings_file=settings_file,
)
c) Instantiate an object from a raw MWA correlator dataset
The MWA correlator writes FITS files containing the correlator dumps (but lacking metadata and not conforming to the uvfits format). pyuvdata can read these files from both the Legacy and MWAX correlator versions, along with MWA metafits files (containing the required metadata), into a UVData object. There are also options for applying cable length corrections, dividing out digital gains, dividing out the coarse band shape, common flagging patterns, using AOFlagger flag files, and phasing the data to the pointing center. It is also optional to apply a Van Vleck correction for Legacy correlator data. The default for this correction is to use a Chebyshev polynomial approximation, and there is an option to instead use a slower integral implementation.
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
# Construct the list of files. Separate files for each coarse band, the
# associated metafits file is also required.
filelist = fetch_data(["mwa_2015_metafits", "mwa_2015_raw_gpubox01"])
# Apply cable corrections and routine time/frequency flagging, phase data to pointing center
uvd = UVData().from_file(filelist, correct_cable_len=True, phase_to_pointing_center=True, flag_init=True)
c) Options for SMA MIR data sets
The SMA has its own bespoke file format known as MIR (no relation to MIRIAD),
which most users prefer to convert to the CASA-based Measurement Sets (MS) for
further processing. The pyuvdata.UVData.from_file() method (and by extension,
pyuvdata.UVData.read() as well) has support for a few extra keywords that
are specific to the MIR file format. These keywords fall broadly into two groups:
selection, and visibility handling.
In addition to the selection keywords supported with UVData objects, there are a few
extra keywords supported for MIR data sets:
- corrchunk: Specifies (typically DSB) correlator window(s) to load.
receiver: Specifies a receiver type (generally some combination of “230”, “240”,“345”, and/or “400”) to load, with different receivers typically used to target different bands and/or polarizations.
sideband: Specifies which sideband to load, with the two options being “l” forlower and “u” for upper.
pseudo_contSpecifies whether to load the “pseudo-continuum” data, which isconstructed as the average of all channels across a single spectral window (set to
Falseby default).
select_whereAn keyword which allows for more advanced selection criterion.See the documentation in
pyuvdata.mir_parser.MirParser()for more details.
Some example use cases for the selection keywords:
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
# Create a path to the SMA MIR test data set in pyuvdata
data_path = fetch_data("sma_mir")
# Let's first try loading just the "230" receivers.
uvd = UVData.from_file(data_path, receivers="230")
assert (uvd.Npols, uvd.Nfreqs) == (1, 131072)
# Now try one sideband, say the lower ("l")
uvd.read(data_path, sidebands="l")
assert (uvd.Npols, uvd.Nfreqs) == (2, 65536)
# Now try one just one chunk (2)
uvd.read(data_path, corrchunk=2)
assert (uvd.Npols, uvd.Nfreqs) == (2, 32768)
# Now all together -- "230" receiver, "l" sideband, chunks 1 and 3
uvd.read(data_path, receivers="230", sidebands="l", corrchunk=[1, 3])
assert (uvd.Npols, uvd.Nfreqs) == (1, 32768)
As for visibility handling keywords:
rechunk: Number of channels to spectrally average the data over on read. This isgenerally the most commonly used keyword, as it reduces the memory/disk space needed to complete read/write operations.
apply_tsys: Normalize the data using system temperature measurements to producesvalues in (uncalibrated) Jy (default is
True).
apply_flags: Apply on-line flags (default isTrue).
For example, the native resolution of the test MIR dataset is 140 kHz – to average this down by a factor of 64 (8.96 MHz resolution) do the following:
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
# Build a path to the test SMA data set
data_path = fetch_data("sma_mir")
# Set things up to average over 64-channel blocks.
uvd = UVData.from_file(data_path, rechunk=64)
Warning
Reading and writing of MIR data will on occasion generate a warning message about the LSTs not being correct. This warning is spurious, and a byproduct how LST values are calculated at time of write (polled average versus calculated based on the timestamp/integration midpoint), and can safely be ignored.
UVData: Writing UVData objects to disk
pyuvdata can write UVData objects to UVFITS, Miriad, CASA Measurement Set and
uvh5 files. Each of these has an associated write method:
pyuvdata.UVData.write_uvfits(), pyuvdata.UVData.write_miriad(),
pyuvdata.UVData.write_ms(), pyuvdata.UVData.write_uvh5(), which
only require a filename (or folder name for Miriad and CASA Measurement Sets) to
write the data to.
pyuvdata can be used to simply convert data from one file type to another by reading in one file type and writing out another.
import os
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
ms_file = fetch_data("vla_casa_tutorial_ms")
# Instantiate an object from a measurement set
uvd = UVData.from_file(ms_file)
# Write the data out to a uvfits file
write_file = os.path.join(".", "tutorial.uvfits")
uvd.write_uvfits(write_file)
UVData: Quick data access
A small suite of functions are available to quickly access the underlying numpy
arrays of data, flags, and nsamples. Although the user can perform this indexing
by hand, several convenience functions exist to easily extract specific subsets
corresponding to antenna-pair and/or polarization combinations. There are three
specific methods that will return numpy arrays: pyuvdata.UVData.get_data(),
pyuvdata.UVData.get_flags(), and pyuvdata.UVData.get_nsamples().
When possible, these methods will return numpy MemoryView
objects, which is relatively fast and adds minimal memory overhead. There are
also corresponding methods pyuvdata.UVData.set_data(),
pyuvdata.UVData.set_flags(), and pyuvdata.UVData.set_nsamples()
which will overwrite sections of these datasets with user-provided data.
a) Data for single antenna pair / polarization combination.
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename)
data = uvd.get_data(1, 2, "rr") # data for ant1=1, ant2=2, pol="rr"
times = uvd.get_times(1, 2) # times for ant1=1, ant2=2 (0th axis of "data" above)
assert data.shape == (9, 64)
assert times.shape == (9,)
# One can equivalently make any of these calls with the input wrapped in a tuple.
data = uvd.get_data((1, 2, "rr"))
times = uvd.get_times((1, 2))
b) Flags and nsamples for above data.
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename)
flags = uvd.get_flags(1, 2, "rr")
nsamples = uvd.get_nsamples(1, 2, "rr")
assert flags.shape == (9, 64)
assert nsamples.shape == (9, 64)
c) Data for single antenna pair, all polarizations.
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename)
data = uvd.get_data(1, 2)
assert data.shape == (9, 64, 4)
# Can also give baseline number, this gives the same array:
data2 = uvd.get_data(uvd.antnums_to_baseline(1, 2))
d) Data for single polarization, all baselines.
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename)
data = uvd.get_data("rr")
assert data.shape == (1360, 64)
e) Update data arrays in place for UVData
There are methods on UVData objects which allow for updating the data, flags, or
nsamples arrays in place. We show how to use the pyuvdata.UVData.set_data()
method below, and note there are analogous pyuvdata.UVData.set_flags()
and pyuvdata.UVData.set_nsamples() methods.
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename)
data = uvd.get_data(1, 2, "rr", force_copy=True, squeeze="none")
data *= 2
uvd.set_data(data, 1, 2, "rr")
f) Iterate over all antenna pair / polarizations.
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename)
for key, data in uvd.antpairpol_iter():
flags = uvd.get_flags(key)
nsamples = uvd.get_nsamples(key)
# Do something with the data, flags, nsamples
g) Convenience functions to ask what antennas, baselines, and pols are in the data.
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename)
# Get all unique antennas in data
unique_ants = uvd.get_ants()
# Get all baseline nums in data.
bl_nums = uvd.get_baseline_nums()
# Get all (ordered) antenna pairs in data (same info as baseline_nums)
antpairs = uvd.get_antpairs()
# Get all antenna pairs and polarizations, i.e. keys produced in UVData.antpairpol_iter()
antpair_pols = uvd.get_antpairpols()
h) Quick access to file attributes of a UV* object (UVData, UVCal, UVBeam)
## in bash ##
# Print data_array.shape to stdout
pyuvdata_inspect.py --attr=data_array.shape <uv*_file>
# Print Ntimes,Nfreqs,Nbls to stdout
pyuvdata_inspect.py --attr=Ntimes,Nfreqs,Nbls <uv*_file>
# Load object to instance name "uv" and will remain in interpreter
pyuvdata_inspect.py -i <uv*_file>
UVData: Plotting
Making a simple waterfall plot.
Note: there is now support for reading in only part of a file for many file types (see UVData: Working with large files), so you need not read in the entire file to plot one waterfall.
from astropy.time import Time
import numpy as np
import matplotlib.pyplot as plt
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename)
# get the data for a single baseline and polarization
waterfall_data = uvd.get_data((1, 2, uvd.polarization_array[0]))
# get the corresponding times for this waterfall
waterfall_times = Time(uvd.get_times((1, 2, uvd.polarization_array[0])), format="jd").iso
# Amplitude waterfall for all spectral channels and 0th polarization
fig, ax = plt.subplots(1, 1)
_ = ax.imshow(np.abs(waterfall_data), interpolation="none", origin="lower")
_ = ax.set_yticks([0, waterfall_times.size - 1])
_ = ax.set_yticklabels([waterfall_times[0], waterfall_times[1]])
freq_tick_inds = np.concatenate((np.arange(0, uvd.Nfreqs, 16), [uvd.Nfreqs-1]))
_ = ax.set_xticks(freq_tick_inds)
_ = ax.set_xticklabels([f"{val:.3f}" for val in uvd.freq_array[freq_tick_inds]*1e-9])
_ = ax.set_xlabel("Frequency (GHz)")
UVData: Selecting data
The pyuvdata.UVData.select() method lets you select specific antennas (by number or name),
antenna pairs, frequencies (in Hz or by channel number), times (or time range),
local sidereal time (LST) (or LST range), or polarizations to keep in the object
while removing others. By default, pyuvdata.UVData.select() will
select data that matches the supplied criteria, but by setting invert=True, you
can instead deselect this data and preserve only that which does not match the
selection.
Note: The same select interface is now supported on the read for many file types (see UVData: Working with large files), so you need not read in the entire file before doing the select.
a) Select 3 antennas to keep using the antenna number.
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename)
assert uvd.Nants_data == 18
uvd.select(antenna_nums=[1, 12, 21])
assert uvd.Nants_data == 3
assert uvd.get_ants().tolist() == [1, 12, 21]
b) Select 3 antennas by name and 4 frequencies to keep
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename)
assert uvd.Nants_data == 18
assert uvd.Nfreqs == 64
uvd.select(antenna_names=["N02", "E09", "W06"], frequencies=uvd.freq_array[0:4])
assert uvd.Nants_data == 3
assert uvd.Nfreqs == 4
c) Select a few antenna pairs to keep
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename)
# note that order of the values in the pair does not matter
uvd.select(bls=[(1, 2), (7, 1), (1, 21)])
# get all the antenna pairs after the select
antpairs = uvd.get_antpairs()
# note that the antpair is listed as it is in the data, not as it was selected on.
assert antpairs == [(1, 2), (1, 7), (1, 21)]
d) Select antenna pairs using baseline numbers
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename)
# select baselines using the baseline numbers
uvd.select(bls=[73736, 73753, 81945])
print(uvd.get_antpairs())
assert uvd.get_antpairs() == [(4, 8), (4, 25), (8, 25)]
e) Select polarizations
Selecting on polarizations can be done either using the polarization numbers or the
polarization strings (e.g. “xx” or “yy” for linear polarizations or “rr” or “ll” for
circular polarizations). Under special circumstances, where x-polarization feeds
(as recorded in telescope.feed_array) are aligned to 0 or 90 degrees relative to a
line perpendicular to the horizon (as record in telescope.feed_angle) and/or
y-polarization are aligned to -90 or 0 degrees, strings representing the cardinal
orientation of the dipole can also be used (e.g. “nn” or “ee”).
from pyuvdata import utils, UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename)
# polarization numbers can be found in the polarization_array
assert uvd.polarization_array.tolist() == [-1, -2, -3, -4]
# polarization numbers can be converted to strings using a utility function
assert utils.polnum2str(uvd.polarization_array) == ['rr', 'll', 'rl', 'lr']
# select polarizations using the polarization numbers
uvd.select(polarizations=[-1, -2, -3])
assert uvd.polarization_array.tolist() == [-1, -2, -3]
assert utils.polnum2str(uvd.polarization_array) == ['rr', 'll', 'rl']
# select polarizations using the polarization strings
uvd.select(polarizations=["rr", "ll"])
assert uvd.polarization_array.tolist() == [-1, -2]
assert utils.polnum2str(uvd.polarization_array) == ['rr', 'll']
# Now deselect polarizations
uvd.select(polarizations=["ll"], invert=True)
assert uvd.polarization_array.tolist() == [-1]
assert utils.polnum2str(uvd.polarization_array) == ['rr']
# read in a file with linear polarizations and an x_orientation
filename = fetch_data("hera_h3c_uvh5")
uvd = UVData.from_file(filename)
assert uvd.polarization_array.tolist() == [-5, -6]
assert utils.polnum2str(uvd.polarization_array) == ['xx', 'yy']
# check x_orientation
assert uvd.telescope.get_x_orientation_from_feeds() == "north"
# select polarizations using the physical orientation strings
uvd.select(polarizations=["ee"])
assert uvd.polarization_array.tolist() == [-6]
assert utils.polnum2str(uvd.polarization_array) == ['yy']
f) Select antenna pairs and polarizations using ant_str argument
Basic options are “auto”, “cross”, or “all”. “auto” returns just the autocorrelations (all pols), while “cross” returns just the cross-correlations (all pols). The ant_str can also contain:
1. Individual antenna number(s):
1: returns all antenna pairs containing antenna number 1 (including the auto correlation)
1,2: returns all antenna pairs containing antennas 1 and/or 2
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename)
# check the number of baselines (antenna pairs) in the original file
assert uvd.Nbls == 153
# Apply select to UVData object
uvd.select(ant_str="1,2,3")
# check the number of baselines (antenna pairs) after the select
assert uvd.Nbls == 48
2. Individual baseline(s):
1_2: returns only the antenna pair (1,2)
1_2,1_3,1_10: returns antenna pairs (1,2),(1,3),(1,10)
(1,2)_3: returns antenna pairs (1,3),(2,3)
1_(2,3): returns antenna pairs (1,2),(1,3)
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename)
# check the number of baselines (antenna pairs) in the original file
assert uvd.Nbls == 153
# Apply select to UVData object
uvd.select(ant_str="(1,2)_(3,7)")
# check the antenna pairs after the select
assert uvd.get_antpairs() == [(1, 3), (1, 7), (2, 3), (2, 7)]
3. Antenna number(s) and polarization(s):
When polarization information is passed with antenna numbers, all antenna pairs kept in the object will retain data for each specified polarization
1x: returns all antenna pairs containing antenna number 1 and polarizations xx and xy
2x_3y: returns the antenna pair (2,3) and polarization xy
1r_2l,1l_3l,1r_4r: returns antenna pairs (1,2), (1,3), (1,4) and polarizations rr, ll, and rl. This yields a complete list of baselines with polarizations of 1r_2l, 1l_2l, 1r_2r, 1r_3l, 1l_3l, 1r_3r, 1r_11l, 1l_11l, and 1r_11r.
(1x,2y)_(3x,4y): returns antenna pairs (1,3),(1,4),(2,3),(2,4) and polarizations xx, yy, xy, and yx
2l_3: returns antenna pair (2,3) and polarizations ll and lr
2r_3: returns antenna pair (2,3) and polarizations rr and rl
1l_3,2x_3: returns antenna pairs (1,3), (2,3) and polarizations ll, lr, xx, and xy
1_3l,2_3x: returns antenna pairs (1,3), (2,3) and polarizations ll, rl, xx, and yx
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename)
# check the number of baselines and polarizations in the original file
assert uvd.Nbls == 153
assert uvd.get_pols() == ['rr', 'll', 'rl', 'lr']
# Apply select to UVData object
uvd.select(ant_str="1r_2l,1l_3l,1r_7r")
# check the number of antenna pairs and polarizations after the select
assert uvd.get_antpairs() == [(1, 2), (1, 3), (1, 7)]
assert uvd.get_pols() == ['rr', 'll', 'rl']
4. Stokes parameter(s):
Can be passed lowercase or uppercase
i,I: keeps only Stokes I
q,V: keeps both Stokes Q and V
5. Minus sign(s):
If a minus sign is present in front of an antenna number, it will not be kept in the data
1,-3: returns all antenna pairs containing antenna 1, but removes any containing antenna 3
1,-1_3: returns all antenna pairs containing antenna 1, except the antenna pair (1,3)
1x_(-3y,10x): returns antenna pair (1,10) and polarization xx
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename)
# check the number of baselines (antenna pairs) in the original file
assert uvd.Nbls == 153
# Apply select to UVData object
uvd.select(ant_str="1,-1_3")
# check the number of baselines (antenna pairs) after the select
assert uvd.Nbls == 16
g) Select based on time or local sidereal time (LST)
You can select times to keep on an object by specifying exact times to keep or time ranges to keep or the desired LSTs or LST range. Note that the LST is expected to be in radians (not hours), consistent with how the LSTs are stored on the object. When specifying an LST range, if the first number is larger than the second, the range is assumed to wrap around LST = 0 = 2*pi.
import numpy as np
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename)
# Times can be found in the time_array, which is length Nblts.
# Use unique to find the unique times (number given by Ntimes)
unique_times = np.unique(uvd.time_array)
assert len(unique_times) == uvd.Ntimes
assert uvd.Ntimes == 15
# make a copy and select some times that are on the object
uvd2 = uvd.copy()
uvd2.select(times=unique_times[0:5])
assert uvd2.Ntimes == 5
# make a copy and select a time range
uvd2 = uvd.copy()
uvd2.select(time_range=[2455312.64023, 2455312.6406])
assert uvd2.Ntimes == 8
# make a copy and select some lsts
uvd2 = uvd.copy()
# LSTs can be found in the lst_array
lsts = np.unique(uvd2.lst_array)
assert len(lsts) == uvd2.Ntimes
# select LSTs that are on the object
uvd2.select(lsts=lsts[0:len(lsts) // 2])
# print length of unique LSTs after select
assert uvd2.Ntimes == 7
h) Select data and return new object (leaving original intact).
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename)
uvd2 = uvd.select(antenna_nums=[1, 12, 21], inplace=False)
assert uvd2.Nants_data < uvd.Nants_data
UVData: Sorting data along various axes
Methods exist for sorting (and conjugating) data along all the data axes to support comparisons between UVData objects and software access patterns.
a) Conjugating baselines
The pyuvdata.UVData.conjugate_bls() method will conjugate baselines to conform to
various conventions ("ant1<ant2", "ant2<ant1", "u<0", "u>0", "v<0",
"v>0") or it can just conjugate a set of specific baseline-time indices.
import numpy as np
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
uvfits_file = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(uvfits_file)
uvd.conjugate_bls("ant1<ant2")
assert np.all(uvd.ant_2_array > uvd.ant_1_array)
uvd.conjugate_bls("u<0", use_enu=False)
assert np.all(uvd.uvw_array[:, 0] <= 0)
b) Sorting along the baseline-time axis
The pyuvdata.UVData.reorder_blts() method will reorder the baseline-time axis by
sorting by "time", "baseline", "ant1" or "ant2" or according to an order
preferred for data that have baseline dependent averaging "bda". A user can also
just specify a desired order by passing an array of baseline-time indices. There is also
an option to sort the auto visibilities before the cross visibilities (autos_first).
import numpy as np
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
uvfits_file = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(uvfits_file)
# The default is to sort first by time, then by baseline
uvd.reorder_blts()
assert np.all(np.diff(uvd.time_array) >= 0)
# Explicity sorting by "time" then "baseline" gets the same result
uvd2 = uvd.copy()
uvd2.reorder_blts("time", minor_order="baseline")
assert uvd == uvd2
uvd.reorder_blts("ant1", minor_order="ant2")
assert np.all(np.diff(uvd.ant_1_array) >= 0)
# You can also sort and conjugate in a single step
uvd.reorder_blts("bda", conj_convention="ant1<ant2")
c) Sorting along the frequency axis
The pyuvdata.UVData.reorder_freqs() method will reorder the frequency axis by
sorting by spectral windows or channels (or even just the channels within specific
spectral windows). Spectral windows or channels can be sorted by ascending or descending
number or in an order specified by passing an index array for spectral window or
channels.
import numpy as np
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
testfile = fetch_data("sma_mir")
uvd = UVData.from_file(testfile)
# Sort by spectral window number and by frequency within the spectral window
# Now the spectral windows are in ascending order and the frequencies in each window
# are in ascending order.
uvd.reorder_freqs(spw_order="number", channel_order="freq")
assert np.all(np.diff(uvd.spw_array) > 0)
assert np.all(np.diff(uvd.freq_array[np.nonzero(uvd.flex_spw_id_array == 1)]) > 0)
# Prepend a ``-`` to the sort string to sort in descending order.
# Now the spectral windows are in descending order but the frequencies in each window
# are in ascending order.
uvd.reorder_freqs(spw_order="-number", channel_order="freq")
assert np.all(np.diff(uvd.spw_array) < 0)
assert np.all(np.diff(uvd.freq_array[np.nonzero(uvd.flex_spw_id_array == 1)]) > 0)
# Use the ``select_spw`` keyword to sort only one spectral window.
# Now the frequencies in spectral window 1 are in descending order but the frequencies
# in spectral window 2 are in ascending order
uvd.reorder_freqs(select_spw=1, channel_order="-freq")
assert np.all(np.diff(uvd.freq_array[np.nonzero(uvd.flex_spw_id_array == 1)]) < 0)
assert np.all(np.diff(uvd.freq_array[np.nonzero(uvd.flex_spw_id_array == 2)]) > 0)
c) Sorting along the polarization axis
The pyuvdata.UVData.reorder_pols() method will reorder the polarization axis
either following the "AIPS" or "CASA" convention, or by an explicit index
ordering set by the user.
from pyuvdata import utils, UVData
from pyuvdata.datasets import fetch_data
uvfits_file = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(uvfits_file)
assert utils.polnum2str(uvd.polarization_array) == ['rr', 'll', 'rl', 'lr']
uvd.reorder_pols("CASA")
assert utils.polnum2str(uvd.polarization_array) == ['rr', 'rl', 'lr', 'll']
UVData: Averaging and Resampling
pyuvdata has methods to average (downsample) in time and frequency and also to upsample in time (useful to get all baselines on the shortest time integration for a data set that has had baseline dependent time averaging applied).
Use the pyuvdata.UVData.downsample_in_time(),
pyuvdata.UVData.upsample_in_time() and pyuvdata.UVData.resample_in_time()
methods to average (downsample) and upsample in time or to do both at once on data
that have had baseline dependent averaging (BDA) applied to put all the baselines
on the same time integrations. Resampling in time is done on phased data by default,
drift mode data are phased, resampled, and then unphased. Set allow_drift=True
to do resampling without phasing.
Use the pyuvdata.UVData.frequency_average() method to average along the frequency
axis.
a) Averaging (Downsampling) in time
Use either the n_times_to_avg keyword to specify an integer factor to average
by or min_int_time to specify a minimum final integration time. Specifying
min_int_time is most appropriate when the integration time varies, e.g. if
the data have had baseline-dependent averaging applied.
import numpy as np
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
datafile = fetch_data("hera_h3c_uvh5")
uvd = UVData.from_file(datafile)
uvd2 = uvd.copy()
assert np.unique(uvd.integration_time).tolist() == [1.879048192]
# first use n_times_to_avg to average by a factor of 2 in time.
uvd.downsample_in_time(n_times_to_avg=2)
assert np.unique(uvd.integration_time).tolist() == [3.758096384]
# Now use min_int_time to average by a factor of 2 in time.
min_integration_time = np.amax(uvd2.integration_time) * 2.0
uvd2.downsample_in_time(min_int_time=min_integration_time)
assert np.unique(uvd2.integration_time).tolist() == [3.758096384]
b) Upsampling in time
import numpy as np
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
datafile = fetch_data("hera_h3c_uvh5")
uvd = UVData.from_file(datafile)
assert np.unique(uvd.integration_time).tolist() == [1.879048192]
max_integration_time = np.amin(uvd.integration_time) / 2.0
uvd.upsample_in_time(max_integration_time)
assert np.unique(uvd.integration_time).tolist() == [0.939524096]
c) Resampling a BDA dataset in time
import numpy as np
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
testfile = fetch_data("sim_bda")
uvd = UVData.from_file(testfile, default_mount_type="fixed")
assert np.unique(uvd.integration_time).tolist() == [2.0, 4.0, 8.0, 16.0]
# Resample all baselines to an 8s integration time
uvd.resample_in_time(8, allow_drift=True)
assert np.unique(uvd.integration_time).tolist() == [8.0]
d) Averaging in frequency
The pyuvdata.UVData.frequency_average() method takes a number of channels to
average together. Use the keep_ragged parameter to control the handling if the
number of frequencies in each spectral window does not divide evenly by the number of
channels to be averaged together. Use the respect_spws parameter to control whether
averaging will be done over spectral window boundaries.
import numpy as np
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
datafile = fetch_data("hera_h3c_uvh5")
uvd = UVData.from_file(datafile)
assert np.unique(uvd.channel_width).tolist() == [122070.3125]
# Average by a factor of 2 in frequency
uvd.frequency_average(n_chan_to_avg=2, keep_ragged=True)
assert np.unique(uvd.channel_width).tolist() == [244140.625]
UVData: Combining and concatenating data
The pyuvdata.UVData.__add__() method lets you combine UVData objects along
the baseline-time, frequency, and/or polarization axis.
a) Combine frequencies.
import numpy as np
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd1 = UVData.from_file(filename)
uvd2 = uvd1.copy()
# Downselect frequencies to recombine
uvd1.select(freq_chans=np.arange(0, 32))
assert uvd1.Nfreqs == 32
uvd2.select(freq_chans=np.arange(32, 64))
assert uvd2.Nfreqs == 32
uvd3 = uvd1 + uvd2
assert uvd3.Nfreqs == 64
b) Combine times.
import numpy as np
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd1 = UVData.from_file(filename)
uvd2 = uvd1.copy()
# Downselect times to recombine
times = np.unique(uvd1.time_array)
uvd1.select(times=times[0:len(times) // 2])
assert uvd1.Ntimes == 7
uvd2.select(times=times[len(times) // 2:])
assert uvd2.Ntimes == 8
uvd3 = uvd1 + uvd2
assert uvd3.Ntimes == 15
c) Combine in place.
The following two commands are equivalent, and act on uvd1 directly without creating a third uvdata object.
import numpy as np
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd1 = UVData.from_file(filename)
uvd2 = uvd1.copy()
times = np.unique(uvd1.time_array)
uvd1.select(times=times[0:len(times) // 2])
uvd2.select(times=times[len(times) // 2:])
uvd1.__add__(uvd2, inplace=True)
uvd1 = UVData.from_file(filename)
uvd2 = uvd1.copy()
uvd1.select(times=times[0:len(times) // 2])
uvd2.select(times=times[len(times) // 2:])
uvd1 += uvd2
d) Reading multiple files.
If the pyuvdata.UVData.read() method is given a list of dataset files or
folders (or list of lists for FHD or MWA correlator datasets), each dataset will
be read in succession and combined with the previous file(s).
import os
import numpy as np
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename)
uvd1 = uvd.select(freq_chans=np.arange(0, 20), inplace=False)
uvd2 = uvd.select(freq_chans=np.arange(20, 40), inplace=False)
uvd3 = uvd.select(freq_chans=np.arange(40, 64), inplace=False)
uvd1.write_uvfits(os.path.join('.', 'tutorial1.uvfits'))
uvd2.write_uvfits(os.path.join('.', 'tutorial2.uvfits'))
uvd3.write_uvfits(os.path.join('.', 'tutorial3.uvfits'))
filenames = [os.path.join('.', f) for f in ['tutorial1.uvfits', 'tutorial2.uvfits', 'tutorial3.uvfits']]
uvd = UVData.from_file(filenames)
e) Fast concatenation
As an alternative to the pyuvdata.UVData.__add__() method,
the pyuvdata.UVData.fast_concat() method can be used.
The user specifies a UVData object to combine with the existing one,
along with the axis along which they should be combined. Fast concatenation can
be invoked implicitly when reading in multiple files as above by passing the
axis keyword argument. This will use the fast_concat method instead of
the __add__ method to combine the data contained in the files into a single
UVData object.
Warning
There is no guarantee that two objects combined in this fashion will result in a self-consistent object after concatenation. Basic checking is done, but time-consuming robust checks are eschewed for the sake of speed. The data will also not be reordered or sorted as part of the concatenation, and so this must be done manually by the user if a reordering is desired (see UVData: Sorting data along various axes).
The pyuvdata.UVData.fast_concat() method is significantly faster than
pyuvdata.UVData.__add__(), especially for large UVData objects.
Preliminary benchmarking shows that reading in
time-ordered visibilities from disk using the axis keyword argument can
improve throughput by nearly an order of magnitude for 100 HERA data files
stored as uvh5 files.
import os
import numpy as np
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename)
uvd1 = uvd.select(freq_chans=np.arange(0, 20), inplace=False)
uvd2 = uvd.select(freq_chans=np.arange(20, 40), inplace=False)
uvd3 = uvd.select(freq_chans=np.arange(40, 64), inplace=False)
uvd1.write_uvfits(os.path.join('.', 'tutorial1.uvfits'))
uvd2.write_uvfits(os.path.join('.', 'tutorial2.uvfits'))
uvd3.write_uvfits(os.path.join('.', 'tutorial3.uvfits'))
filenames = [os.path.join('.', f) for f in ['tutorial1.uvfits', 'tutorial2.uvfits', 'tutorial3.uvfits']]
uvd = UVData.from_file(filenames, axis='freq')
UVData: Summing and differencing visibilities
Simple summing and differencing of visibilities can be done with the pyuvdata.UVData.sum_vis()
and pyuvdata.UVData.diff_vis() methods.
import numpy as np
from astropy.time import Time
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd1 = UVData.from_file(filename)
uvd2 = uvd1.copy()
# sum visibilities
uvd1 = uvd1.sum_vis(uvd2)
# diff visibilities
uvd1 = uvd1.diff_vis(uvd2)
# in place option
uvd1.sum_vis(uvd2, inplace=True)
# override a particular parameter
rdate_obj = Time(np.floor(uvd1.time_array[0]), format="jd", scale="utc")
uvd1.rdate = rdate_obj.strftime("%Y-%m-%d")
uvd1.sum_vis(uvd2, inplace=True, override_params=["rdate"])
UVData: Phasing
Adjusting the phase of interferometric visibilities for the purposes of imaging and other analyses is called phasing. See our phasing memo for a detailed description of phasing and the specific implementation details in pyuvdata.
We use the pyuvdata.UVData.print_phase_center_info() to print a nice description
of the phasing of the object.
from astropy.time import Time
from numpy import pi
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
uvh5_file = fetch_data("hera_h3c_uvh5")
uvd = UVData.from_file(uvh5_file)
uvd.print_phase_center_info()
This object is initially unprojected (unphased) so the print command yields:
ID Cat Entry Type Az/Lon/RA El/Lat/Dec Frame
# Name deg deg
----------------------------------------------------------------------
0 zenith unprojected 0:00:00.00 +90:00:00.00 altaz
When phasing, the user needs to supply a name for each phase center, though it does not need to be unique. We are specifying that the type here is “sidereal”, which means that the position is represented by a fixed set of coordinates in a sidereal coordinate frame (e.g., ICRS, FK5, etc).
uvd.phase(lon=5.23368, lat=0.710940, epoch="J2000", cat_name="target1", cat_type="sidereal")
uvd.print_phase_center_info()
Now the print command yields:
ID Cat Entry Type Az/Lon/RA El/Lat/Dec Frame Epoch
# Name hours deg
------------------------------------------------------------------------------
1 target1 sidereal 19:59:28.27 +40:44:01.90 icrs J2000.0
You can use the phase_to_time method to phase to zenith at a particular time. The time can be passed as an astropy Time object or as a float which will be interpreted as a JD
uvd.phase_to_time(Time(uvd.time_array[0], format="jd"))
uvd.print_phase_center_info()
Now the print command yields:
ID Cat Entry Type Az/Lon/RA El/Lat/Dec Frame Epoch
# Name hours deg
------------------------------------------------------------------------------
0 zenith_at_jd2458661.234803 sidereal 13:20:57.92 -30:37:09.44 icrs J2000.0
You can also now phase to “ephem” objects, which move with time, e.g. solar system bodies. The phase method has a lookup_name option which, if set to true, will allow you to search JPL-Horizons for coords
uvd.phase(lon=0, lat=0, epoch="J2000", cat_name="Sun", lookup_name=True)
uvd.print_phase_center_info()
Now the print command yields:
ID Cat Entry Type Az/Lon/RA El/Lat/Dec Frame Epoch Ephem Range Dist V_rad
# Name hours deg Start-MJD End-MJD pc km/s
---------------------------------------------------------------------------------------------------------------------
1 Sun ephem 6:19:28.68 +23:21:44.63 icrs J2000.0 58660.25 58661.00 1.0e+00 0.2157
Finally, we can use a selection mask to only phase part of the data at a time, like only the data belonging to the first integration. In this example we’ll phase the first integration to a “driftscan” target, which is phased to a particular azimuth and elevation (note this is different than “unprojected” data – which used to be designated with phase_type=”drift” – in that it is still phased and can be to any azimuth and elevation, not just zenith). Note that we need to supply phase_frame as “altaz”, since driftscans are always in that frame.
select_mask = uvd.time_array == uvd.time_array[0]
uvd.phase(lon=0, lat=pi/2, cat_name="zenith", phase_frame="altaz", cat_type="driftscan", select_mask=select_mask)
uvd.print_phase_center_info()
Now when using print_phase_center_info, we see that there are multiple phase centers present in the data:
ID Cat Entry Type Az/Lon/RA El/Lat/Dec Frame Epoch Ephem Range Dist V_rad
# Name deg deg Start-MJD End-MJD pc km/s
----------------------------------------------------------------------------------------------------------------------
0 zenith driftscan 0:00:00.00 +90:00:00.00 altaz J2000.0
1 Sun ephem 94:52:10.21 +23:21:44.63 icrs J2000.0 58660.25 58661.00 1.0e+00 0.2157
We can unproject (unphase) data using the unproject_phase method:
uvd.unproject_phase()
uvd.print_phase_center_info()
Now when using print_phase_center_info, we’ll see that all the data are unprojected:
ID Cat Entry Type Az/Lon/RA El/Lat/Dec Frame
# Name deg deg
----------------------------------------------------------------------
2 unprojected unprojected 0:00:00.00 +90:00:00.00 altaz
UVData: Working with large files
To save on memory and time, pyuvdata supports reading only parts of uvfits, uvh5, miriad, and MWA correlator fits files.
Note that select on read (partial reading) is not always faster than reading an entire file and then downselecting. Which approach is faster depends on the fraction of data that is selected as well on the relationship between the selection and the internal data ordering in the file. When the select is on a small area of the file or has a regular stride it can be much faster to do the select on read, but in other cases it can be slower. Select on read does generally reduce the memory footprint.
a) Reading just the metadata of a file
For uvh5, uvfits, FHD and MWA correlator fits files, reading in the only the metadata results in a metadata only UVData object (which has every attribute except the data_array, flag_array and nsample_array filled out). For Miriad files, less of the metadata can be read without reading the data, but many of the attributes are available.
Measurement set (ms) files do not support reading only the metadata (the read_data keyword is ignored for ms files).
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
# read the metadata but not the data
uvd = UVData.from_file(filename, read_data=False)
assert uvd.metadata_only
assert uvd.time_array.size == 1360
assert uvd.data_array is None
b) Reading only parts of files
The same options that are available for the pyuvdata.UVData.select() method can
also be passed to the pyuvdata.UVData.read() method to do the select on the read,
saving memory and time if only a portion of the data are needed.
Note that these keywords can be used for any file type, but for FHD and measurement set (ms) files, the select is done after the read, which does not save memory. Miriad and Mir only supports some of the selections on the read, the unsupported ones are done after the read. MWA correlator fits has support for most but not all selections, the unsupported ones are done after the read.
import numpy as np
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("vla_casa_tutorial_uvfits")
uvd = UVData.from_file(filename, freq_chans=np.arange(32))
assert uvd.Nfreqs == 32
# Reading in the metadata can help with specifying what data to read in
uvd = UVData.from_file(filename, read_data=False)
assert uvd.Ntimes == 15
unique_times = np.unique(uvd.time_array)
times_to_keep = unique_times[[0, 2, 4]]
uvd = UVData.from_file(filename, times=times_to_keep)
assert uvd.Ntimes == 3
# Select a few baselines from a miriad file
filename = fetch_data("hera_old_miriad")
uvd = UVData.from_file(filename, bls=[(9, 10), (9, 20)])
assert uvd.get_antpairs() == [(9, 10), (9, 20)]
# Select certain frequencies from a uvh5 file
filename = fetch_data("hera_h3c_uvh5")
uvd = UVData.from_file(filename, freq_chans=np.arange(2))
assert uvd.Nfreqs == 2
c) Writing to a uvh5 file in parts
It is possible to write to a uvh5 file in parts, so not all of the file needs to be in memory at once. This is very useful when combined with partial reading described above, so that operations that in principle require all of the data, such as applying calibration solutions, can be performed even in situations where the available memory is smaller than the size of the file.
Partial writing requires two steps: initializing an empty file on disk with the correct metadata for the final object, and then subsequently writing the data in stages to that same file. In this latter stage, the same syntax for performing a selective read operation is used, so that the user can precisely specify which parts of the data, flags, and nsample arrays should be written to. The user then also provides the data, flags, and nsample arrays of the proper size, and they are written to the appropriate parts of the file on disk.
import os
import numpy as np
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
filename = fetch_data("hera_h3c_uvh5")
uvd = UVData.from_file(filename, read_data=False)
partfile = os.path.join('.', 'tutorial_partial_io.uvh5')
uvd.initialize_uvh5_file(partfile, clobber=True)
# read in the lower and upper halves of the band separately, and apply different scalings
Nfreqs = uvd.Nfreqs
Hfreqs = Nfreqs // 2
freq_inds1 = np.arange(Hfreqs)
freq_inds2 = np.arange(Hfreqs, Nfreqs)
uvd2 = UVData()
uvd2 = UVData.from_file(filename, freq_chans=freq_inds1)
data_array = 0.5 * uvd2.data_array
flag_array = uvd2.flag_array
nsample_array = uvd2.nsample_array
uvd.write_uvh5_part(
partfile,
data_array=data_array,
flag_array=flag_array,
nsample_array=nsample_array,
freq_chans=freq_inds1
)
uvd2 = UVData.from_file(filename, freq_chans=freq_inds2)
data_array = 2.0 * uvd2.data_array
flag_array = uvd2.flag_array
nsample_array = uvd2.nsample_array
uvd.write_uvh5_part(
partfile,
data_array=data_array,
flag_array=flag_array,
nsample_array=nsample_array,
freq_chans=freq_inds2
)
UVData: Working with Redundant Baselines
a) Finding Redundant Baselines
The method pyuvdata.UVData.get_redundancies() provides options for finding
redundant groups of baselines in an array, either by antenna positions or uvw
coordinates. Baselines are considered redundant if they are within a specified tolerance
distance (default is 1 meter).
The default behavior is to use the uvw_array on the object (representing the baselines
that have data on the object) to find redundancies among the uvw vectors. If the
include_conjugates option is set, it will include baselines that are redundant when
reversed in the same group. In this case, a list of conjugates is returned as well,
which contains indices for the baselines that were flipped for the redundant groups.
If the use_antpos keyword is set, antenna_positions will be used to calculate
redundancies instead of the uvw_array. This can result in different behavior because
all possible redundant baselines will be returned, not just the ones with data on the
object. In this case, the baselines are defined in the u>0 convention, so some of the
baselines may be conjugated relative to the baselines with data on the object. If the
conjugate_bls keyword is set, it will also update the baseline conjugation on the object
so that the baselines in the returned groups correspond with the baselines listed on the
object (except for antenna pairs with no associated data).
There are also utility functions to get redundant groups from either a list of baselines
vectors and corresponding baseline indices
(pyuvdata.utils.redundancy.get_baseline_redundancies())
or antenna positions and antenna indices
(pyuvdata.utils.redundancy.get_antenna_redundancies()). Note that using these utility
functions for the baselines on an object is less memory efficient than using
pyuvdata.UVData.get_redundancies() because the latter only uses the first time in
the baseline array.
from pyuvdata import utils, UVData
from pyuvdata.datasets import fetch_data
# This file contains a HERA19 layout.
uvd = UVData.from_file(fetch_data("sim_airy_hex"))
uvd.unproject_phase(use_ant_pos=True)
tol = 0.05 # Tolerance in meters
# Returned values: list of redundant groups, corresponding mean baseline vectors, baseline lengths. No conjugates included, so conjugates is None.
baseline_groups, vec_bin_centers, lengths = uvd.get_redundancies(tol=tol, include_conjugates=False)
assert len(baseline_groups) == 19
# The include_conjugates option includes baselines that are redundant when reversed.
# If used, the conjugates list will contain a list of indices of baselines that must be flipped to be redundant.
baseline_groups, vec_bin_centers, lengths, conjugates = uvd.get_redundancies(tol=tol, include_conjugates=True)
assert len(baseline_groups) == 19
# Using antenna positions instead. This has one more group because it includes the autos.
baseline_groups, vec_bin_centers, lengths = uvd.get_redundancies(tol=tol, use_antpos=True)
assert len(baseline_groups) == 20
# get_redundancies has the option to ignore autocorrelations.
baseline_groups, vec_bin_centers, lengths = uvd.get_redundancies(tol=tol, use_antpos=True, include_autos=False)
assert len(baseline_groups) == 19
b) Compressing/inflating on Redundant Baselines
Since redundant baselines should have similar visibilities, some level of data
compression can be achieved by only keeping one out of a set of redundant baselines.
The pyuvdata.UVData.compress_by_redundancy() method will find groups of baselines that are
redundant to a given tolerance and either average over them or select a single
baseline from the redundant group. If the data are identical between redundant
baselines (e.g. if they are from a noiseless simulation) the “select” method
should be used as it is much faster. If the “average” method is used, the data
are combined with a weighted average using the nsample_array as weights
and the final nsample_array will be a sum of the nsample_array of the
combined baselines (so it can be larger than 1).
This action is (almost) inverted by the pyuvdata.UVData.inflate_by_redundancy()
method, which finds all possible baselines from the antenna positions and fills
in the full data array based on redundancy.
import numpy as np
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
uv0 = UVData.from_file(fetch_data("sim_airy_hex"))
tol = 0.02 # In meters
# Compression can be run in-place or return a separate UVData object.
uv_backup = uv0.copy()
uvd2 = uv0.compress_by_redundancy(method="select", tol=tol, inplace=False)
uv0.compress_by_redundancy(method="select", tol=tol)
assert uvd2 == uv0
# Note -- Compressing and inflating changes the baseline order, reorder before comparing.
uv0.inflate_by_redundancy(tol=tol)
uv_backup.reorder_blts(conj_convention="u>0", uvw_tol=tol)
uv0.reorder_blts()
assert np.all(uv0.baseline_array == uv_backup.baseline_array)
uvd2.inflate_by_redundancy(tol=tol)
assert uvd2 == uv0
UVData: flex_pol objects
Support for flex_pol objects was initially developed to support data sets where
spectral windows have different polarizations, which can be generated by some telescopes
(e.g. the SMA). In these objects, it makes sense to combine the polarization and
frequency axis. To represent this on a UVData object, the flex_spw_polarization_array
is set to an array of length Nspws containing the polarization number for each
spectral window and the polarization_array is set to a length-1 array containing 0,
which is not polarization number that is used for any existing polarization, and the
Npols parameter is set to 1. A consequence of this is that each spectral window can
only have one polarization, which is standard for the SMA when the spectral windows are
divided (because cross polarization visibilities do not make sense across frequencies
and are not recorded). To represent this kind of data without the flex_pol structure,
the data-like arrays would need to be padded out with flagged data for the polarizations
that were not observed, making the object much larger than required. This padding and
flagging can be done with the pyuvdata.UVData.remove_flex_pol() method, but it can
substantially increase the size of the object in memory. This is done in the write
methods for some data file types (i.e. uvfits, miriad) which do not support flex_pol
style data.
Subsequently it was realized that this infrastructure could also be leveraged to write
a file with the polarization axis changing slowest, which was desired for the HERA
telescope because the most common data access pattern was to operate on only some of the
polarizations at a time. In that situation, all polarizations are observed in all the
spectral windows, but each polarization can be split into separate spectral windows and
the data-like arrays can be reshaped to make the polarization axis change slowest.
The pyuvdata.UVData.convert_to_flex_pol() method can be used to convert a standard
UVData object into a flex_pol object by splitting the polarizations into separate
spectral windows. The pyuvdata.UVData.remove_flex_pol() method with the
combine_spws parameter set to True (which is the default) can be used to
convert back to a standard UVData object.
a) Working with an SMA flex_pol object
We do not currently have an SMA flex_pol file in the repo, so we first generate one using some low-level Mir interfaces, then explore some of its properties and convert it into a standard UVData object (which doubles its size).
import numpy as np
from pyuvdata import UVData
from pyuvdata.uvdata.mir import Mir
from pyuvdata.uvdata.mir_parser import MirParser
from pyuvdata.datasets import fetch_data
mir_file = fetch_data("sma_mir")
mir_data = MirParser(mir_file, load_cross=True, load_auto=True, has_auto=True)
# Read in the raw data so that we can manipulate it, and make it look like the
# test data set was recorded with split-tuning
mir_data.sp_data._data["gunnLO"][np.isin(mir_data.sp_data["blhid"], [1, 3])] += 30.0
mir_data.sp_data._data["fsky"][np.isin(mir_data.sp_data["blhid"], [1, 3])] += 30.0
# Convert MirParser object into a UVData object with flex-pol enabled.
mir_uv = UVData()
mir_obj = Mir()
mir_obj._init_from_mir_parser(mir_data)
mir_uv._convert_from_filetype(mir_obj)
# Start exploring SMA flex_pol object
assert mir_uv.spw_array.tolist() == [-255, -4, -3, -2, -1, 1, 2, 3, 4, 255, 257, 508, 509, 510, 511, 513, 514, 515, 516, 767]
print(mir_uv.flex_spw_polarization_array.tolist())
assert mir_uv.flex_spw_polarization_array.tolist() == [-5, -5, -5, -5, -5, -5, -5, -5, -5, -5, -6, -6, -6, -6, -6, -6, -6, -6, -6, -6]
assert mir_uv.polarization_array == 0
assert mir_uv.data_array.shape == (1, 262160, 1)
# Use the ``remove_flex_pol`` method to get a standard object. Note that it
# doubles the data_array size
mir_uv.remove_flex_pol(combine_spws=False)
print(mir_uv.spw_array)
assert mir_uv.spw_array.tolist() == [-255, -4, -3, -2, -1, 1, 2, 3, 4, 255, 257, 508, 509, 510, 511, 513, 514, 515, 516, 767]
assert mir_uv.flex_spw_polarization_array is None
assert mir_uv.polarization_array.tolist() == [-6, -5]
assert mir_uv.data_array.shape == (1, 262160, 2)
b) Converting between standard and flex_pol objects
Here we load a HERA dataset and convert it to a flex-pol object, which reshapes the data_array but does not change its total size and then back, recovering the initial object.
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
uvd = UVData.from_file(fetch_data("hera_h3c_uvh5"))
# make a copy to enable comparisons after converting to and from flex_pol
uvd_orig = uvd.copy()
assert uvd.polarization_array.tolist() == [-5, -6]
assert uvd.spw_array.tolist() == [0]
assert uvd.flex_spw_polarization_array is None
assert uvd.data_array.shape == (200, 4, 2)
uvd.convert_to_flex_pol()
assert uvd.polarization_array.tolist() == [0]
assert uvd.spw_array.tolist() == [0, 1]
assert uvd.flex_spw_polarization_array.tolist() == [-5, -6]
assert uvd.data_array.shape == (200, 8, 1)
uvd.remove_flex_pol()
assert uvd.polarization_array.tolist() == [-5, -6]
assert uvd.spw_array.tolist() == [0]
assert uvd.flex_spw_polarization_array is None
assert uvd.data_array.shape == (200, 4, 2)
UVData: Instantiating from arrays in memory
pyuvdata can also be used to create a UVData object from arrays in memory. This
is useful for mocking up data for testing or for creating a UVData object from
simulated data. Instead of instantiating a blank object and setting each required
parameter, you can use the pyuvdata.Telescope.new() static method, which
deals with the task of creating a consistent object from a minimal set of inputs
from astropy.coordinates import EarthLocation
import numpy as np
from pyuvdata import Telescope, UVData
uvd = UVData.new(
freq_array = np.linspace(1e8, 2e8, 100),
polarization_array = ["xx", "yy"],
telescope = Telescope.new(
antenna_positions = {
0: [0.0, 0.0, 0.0],
1: [0.0, 0.0, 1.0],
2: [0.0, 0.0, 2.0],
},
location = EarthLocation.from_geodetic(0, 0, 0),
name = "test",
instrument = "test",
),
times = np.linspace(2459855, 2459856, 20),
)
Notice that you need only provide the required parameters, and the rest will be
filled in with sensible defaults. The telescope related metadata is passed
directly to a simple Telescope constructor which also only requires the minimal
set of inputs but can accept any other parameters supported by the class.
Importantly, the times and baselines can be provided either as unique values,
with the intention that their cartesian outer product should be
used (i.e. the combination of each provided time with each baseline), or as full
length-Nblt arrays (if you don’t require all combinations). While this behaviour can
be inferred, it is best to set the do_blt_outer keyword to True or False
to enable this. Let us for example create an unusual object with 4 times and 4 baselines,
where each baseline observed one time each. This case is ambiguous without the
do_blt_outer keyword, so we must set it:
from astropy.coordinates import EarthLocation
import numpy as np
from pyuvdata import Telescope, UVData
times = np.array([2459855.0, 2459855.1, 2459855.2, 2459855.3])
antpairs = [(0, 1), (0, 2), (1, 2), (1, 1)]
uvd = UVData.new(
freq_array = np.linspace(1e8, 2e8, 100),
polarization_array = ["xx", "yy"],
telescope = Telescope.new(
antenna_positions = {
0: [0.0, 0.0, 0.0],
1: [0.0, 0.0, 1.0],
2: [0.0, 0.0, 2.0],
},
location = EarthLocation.from_geodetic(0, 0, 0),
name = "test",
instrument = "test",
),
times = times,
antpairs=antpairs,
do_blt_outer=False,
)
assert uvd.Nblts == 4
assert np.allclose(uvd.time_array, times)
Notice that the resulting object only has 4 baseline-times. If we had set
do_blt_outer = True, we would have gotten the cartesian outer product of the
provided times and baselines, which would have resulted in 16 baseline-times:
from astropy.coordinates import EarthLocation
import numpy as np
from pyuvdata import Telescope, UVData
times = np.array([2459855.0, 2459855.1, 2459855.2, 2459855.3])
antpairs = [(0, 1), (0, 2), (1, 2), (1, 1)]
uvd_rect = UVData.new(
freq_array = np.linspace(1e8, 2e8, 100),
polarization_array = ["xx", "yy"],
telescope = Telescope.new(
antenna_positions = {
0: [0.0, 0.0, 0.0],
1: [0.0, 0.0, 1.0],
2: [0.0, 0.0, 2.0],
},
location = EarthLocation.from_geodetic(0, 0, 0),
name = "test",
instrument = "test",
),
times = times,
antpairs=antpairs,
do_blt_outer=True,
)
assert uvd_rect.Nblts == 16
assert np.allclose(uvd_rect.time_array, np.repeat(times, 4))
To change the order of the blt-axis, set the time_axis_faster_than_bls keyword:
from astropy.coordinates import EarthLocation
import numpy as np
from pyuvdata import Telescope, UVData
times = np.array([2459855.0, 2459855.1, 2459855.2, 2459855.3])
antpairs = [(0, 1), (0, 2), (1, 2), (1, 1)]
uvd_rect = UVData.new(
freq_array = np.linspace(1e8, 2e8, 100),
polarization_array = ["xx", "yy"],
telescope = Telescope.new(
antenna_positions = {
0: [0.0, 0.0, 0.0],
1: [0.0, 0.0, 1.0],
2: [0.0, 0.0, 2.0],
},
location = EarthLocation.from_geodetic(0, 0, 0),
name = "test",
instrument = "test",
),
times = times,
antpairs=antpairs,
do_blt_outer=True,
time_axis_faster_than_bls=True,
)
assert uvd_rect.Nblts == 16
assert np.allclose(
uvd_rect.time_array, np.repeat(times[np.newaxis, :], 4, axis=0).flatten()
)
See the full documentation for the method
pyuvdata.uvdata.UVData.new() for more information.
UVData: Location conversions
A number of conversion methods exist to map between different coordinate systems for locations on the earth, see Functions for working with telescope coordinates for all options.
Note that the UVData.telescope.location attribute is an
astropy.EarthLocation object, so it can be used directly to get to any
astropy supported coordinate system.
a) Getting antenna positions in East, North, Up (ENU) frame in units of meters
Note that the ENU frame is sometimes referred to as the topocentric frame but in many references the topocentric frame has the pole on the axis of rotation for the Earth rather than at the local zenith. We just call it the ENU frame for clarity.
Use the pyuvdata.Telescope.get_enu_antpos() to get the antenna
positions in the ENU frame. Or use the telescope.location and
telescope.antenna_positions attributes (which are ECEF positions relative
to the telescope.location) with the pyuvdata.utils.ENU_from_ECEF()
utility method.
# directly from Telescope object
from astropy.units import Quantity
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
data_file = fetch_data("paper_2012_miriad")
uvd = UVData.from_file(data_file)
antpos = uvd.telescope.get_enu_antpos()
# using utils
from pyuvdata import utils
# get antennas positions in ECEF
telescope_ecef_xyz = Quantity(uvd.telescope.location.geocentric).to_value("m")
antpos = uvd.telescope.antenna_positions + telescope_ecef_xyz
# convert to East, North, Up (ENU) coords.
antpos = utils.ENU_from_ECEF(antpos, center_loc=uvd.telescope.location)
UVData: Normalizing data
If autocorrelation data is stored alongside cross-correlations, then one can convert
arbitrarily-scaled data in data_array to units of correlation coefficients by way of
the pyuvdata.UVData.normalize_by_autos() method. In this normalization step, each
cross-correlation visibility (frequency channel on a given baseline for a particular
integration) is divided by the geometric mean of the two autocorrelations, belonging
to the two antennas that make up the baseline for the cross-correlation. As the
statistical uncertainty in the amplitude of the autos is under most circumstances
relatively much less than that for the crosses, performing this step affords one the
ability to perform some basic flux scaling of the data provided some a priori
information about the antennas (namely the system temperature and the so-called “forward
gain” of the antenna, which typically depend on geometric size and aperture efficiency).
Note that when normalizing, if the corresponding autocorrelations are not found or are
otherwise marked as bad in flag_array, then the the cross-correlation will be
flagged as well (e.g., if all of antenna 1’s autos are flagged, then every baseline that
contains antenna 1 will also be flagged).
import numpy as np
from pyuvdata import UVData
from pyuvdata.datasets import fetch_data
uvd = UVData.from_file(fetch_data("hera_h3c_uvh5"))
# Build a binary mask where the cross-correlations are stored.
cross_mask = uvd.ant_1_array != uvd.ant_2_array
# Check to see that all the crosses have amplitudes greater than 1
assert np.all(np.abs(uvd.data_array[cross_mask]) > 1)
# On normalization, you can convert arb scaled data to correlation coefficients,
# which should always be less than 1 in amplitude.
uvd.normalize_by_autos()
assert np.all(np.abs(uvd.data_array[cross_mask]) < 1)
# An important note for using normalize_by_autos is that it will usually leave the
# autos alone unless told otherwise, in order to make reverting normalization
# possible. We can see this by checking the values of the autos.
assert not np.any(uvd.data_array[~cross_mask] < 1)
# Finally, we can undo the above by setting invert=True.
uvd.normalize_by_autos(invert=True)
assert np.all(np.abs(uvd.data_array[cross_mask]) > 1)