nahiku package

Submodules

nahiku.balmung module

This code originally developed by Dan Hey at https://github.com/danhey/balmung/blob/master/balmung/balmung.py This file is used with permission for the development of the Nahiku package.

class nahiku.balmung.Balmung(time: ndarray, flux: ndarray)[source]

Bases: object

amplitude_spectrum(t, y, fmin: float = None, fmax: float = None, oversample_factor: float = 5.0) → tuple[source]

Calculates the amplitude spectrum at a given time and flux input

Parameters:

t (np.ndarray) – Time values
y (np.ndarray) – Flux values
fmin (float, optional) – Minimum frequency. Defaults to None.
fmax (float, optional) – Maximum frequency. Defaults to None.
oversample_factor (float, optional) – Amount by which to oversample the light curve. Defaults to 5.0.

Returns:

Frequency and amplitude arrays

Return type:

tuple

dft_phase(x: ndarray, y: ndarray, f: float) → float[source]

Calculates the phase at a single frequency using the Discrete Fourier Transform

Parameters:

x (np.ndarray) – Time values
y (np.ndarray) – Flux values
f (float) – Frequency

Returns:

Phase at given frequency

Return type:

float

estimate_background(x: ndarray, y: ndarray, log_width: float = 0.01) → ndarray[source]

Estimates the background signal

Parameters:

x (np.ndarray) – [description]
y (np.ndarray) – [description]
log_width (float, optional) – [description]. Defaults to 0.01.

Returns:

[description]

Return type:

[type]

find_highest_peak(f: ndarray, a: ndarray) → float[source]

Uses three point parabolic interpolation to find the highest peaks in the amplitude spectrum

Parameters:

f (np.ndarray) – Frequency values
a (np.ndarray) – Amplitude values

Returns:

Maximum frequency

Return type:

float

fit(theta: list) → ndarray[source]

Small wrapper for curve_fit.

Parameters:

time (np.ndarray) – Time values
flux (np.ndarray) – Flux values
theta (list) – Array-like of initial guesses

Returns:

Fitted parameters

Return type:

list

grad_model(time: ndarray, freq: float, amp: float, phi: float) → ndarray[source]

Gradient function of our pulsation model

Parameters:

time (np.ndarray) – Time values
freq (float) – Frequency
amp (float) – Amplitude
phi (float) – Phase

Returns:

Gradient vector (dModel/d_{freq,amp,phi})

Return type:

np.ndarray

initialize_guess(fmin: float, fmax: float)[source]

model(time: ndarray, freq: float, amp: float, phi: float) → ndarray[source]

And at the heart of it all, a tiny model function.

Parameters:

time (np.ndarray) – Time values
freq (float) – Frequency
amp (float) – Amplitude
phi (float) – Phase

Returns:

Sinusoid at the given parameters

Return type:

np.ndarray

plot(ax=None)[source]

plot_lc(ax=None)[source]

plot_residual(ax=None)[source]

prewhiten(fmin=None, fmax=None, minimum_snr=0, maxiter=200, diagnose=True)[source]

nahiku.exhaustive_helpers module

nahiku.exhaustive_helpers.compute_interval_pvalue(y_true, mvn_pred)[source]

This function computes the Mahalanobis distance and corresponding p-value for a true observation under a predicted multivariate normal distribution. It attempts to compute the Mahalanobis distance using Cholesky decomposition for numerical stability, and falls back to direct solving if Cholesky fails (e.g., if the covariance is not positive definite). The p-value is computed based on the chi-squared distribution with degrees of freedom equal to the dimensionality of the data.

Parameters:

y_true (torch.Tensor) – shape (d,)
mvn_pred (MultivariateNormal) – MultivariateNormal from GPyTorch (mean: (d,), covar: (d,d))

nahiku.exhaustive_helpers.interval_posterior_from_precision(mu, J, full_y, mask_train, mask_test, dtype=torch.float64)[source]

Compute the posterior distribution p(y_test | y_train) using precision matrix J and mean mu. This function computes the conditional posterior distribution over test points given training data, using the precision (inverse covariance) matrix and the prior mean. It leverages the block structure of the precision matrix to efficiently compute the conditional distribution.

Parameters:

mu (torch.Tensor) – (N,) mean vector over the full grid
J (torch.Tensor) – (N, N) precision matrix over the full grid, where J = (K + σ²I)^{-1}
full_y (torch.Tensor) – (N,) output values on the full grid
mask_train (torch.Tensor or bool array) – (N,) boolean mask indicating training data points
mask_test (torch.Tensor or bool array) – (N,) boolean mask indicating test data points
dtype (torch.dtype) – Data type for computations (default: torch.float64)

nahiku.exhaustive_helpers.precompute_precision(full_x, mean_module, kernel_module, noise_variance, dtype=torch.float64, device='cpu')[source]

Build μ and J = (K + σ²I)^{-1} for the full X once. Use float64 here for numerical stability; you can cast later if desired.

Parameters:

full_x (torch.Tensor) – (N, D) tensor of all input locations
mean_module (GPyTorch mean module) – GPyTorch mean module that can be called on full_x
kernel_module (GPyTorch kernel module) – GPyTorch kernel module that can be called on full_x to produce a LazyTensor covariance
noise_variance (float) – observation noise variance σ²
dtype (torch.dtype) – torch dtype for computations (default: torch.float64 for stability)
device (str) – torch device for computations (default: “cpu”)

nahiku.exhaustive_search module

class nahiku.exhaustive_search.ExhaustiveSearch(x, y, dominant_period, device='cpu', min_anomaly_len=50, max_anomaly_len=100, window_slide_step=100, window_size_step=50, assume_independent=True, which_test_metric='pval')[source]

Bases: Search

This class implements an exhaustive search algorithm to identify anomalous intervals in a time series using Gaussian Processes.

Method:

List every possible contiguous interval in the time series that could contain an anomaly, based on priors (e.g., minimum and maximum anoamly duration).
Fit the entire time series with a GP and store the optimized parameters (and, optionally, the full precision matrix if using dynamic programming).
For each candidate interval, compute the posterior likelihood of the test interval given a MultivariateNormal distribution fit to the rest of the data. Optionally, use dynamic programming to compute the posterior likelihoods more efficiently by leveraging the precision matrix of the full GP fit.
Flag intervals as anomalous if a metric measuring posterior likelihood is below a certain threshold (e.g. mahalanobis distance below some threshold)

search_for_anomaly(filename='', refit=False, neg_anomaly_only=False, pos_anomaly_only=False, dynamic_programming=False, threshold=0.001, num_intervals_to_flag=None, silent=True, plot=False, **kwargs)[source]

Main function to perform the exhaustive search for anomalies in the time series data.

Parameters:

filename (str) – If provided, saves the results to this file (default: “”)
refit (bool) – If true, refit the GP for each interval. If false, use the same GP for all intervals (faster but less accurate) (default: False)
neg_anomaly_only (bool) – Whether to only flag negative anomalies (i.e., dips) instead of both positive and negative anomalies (default: False)
pos_anomaly_only (bool) – Whether to only flag positive anomalies (i.e., flares) instead of both positive and negative anomalies (default: False)
dynamic_programming (bool) – If true, use dynamic programming to find the best interval. Only works if refit = False (default: False)
threshold (float) – Threshold for flagging an interval as anomalous based on the test metric (default: 1e-5)
num_intervals_to_flag (int or None) – If not None, flag the top num_intervals_to_flag intervals as anomalous based on the test metric, instead of using a threshold (default: None)
silent (bool) – If true, suppresses print statements during training (default: True)
plot (bool) – If true, plots the GP prediction and p-value for each candidate interval (default: False)
kwargs – Additional keyword arguments to pass to the GP training function, such as training_iterations, lr, early_stopping, etc.

nahiku.gp_helpers module

class nahiku.gp_helpers.ExactGPModel(train_x, train_y, likelihood, kernel, mean)[source]

Bases: ExactGP

Initialize the ExactGPModel with a specified kernel and mean function, and define the forward method to compute the GP output.

forward(x)[source]

Compute the forward pass of the ExactGPModel.

Parameters:: x (torch.Tensor) – Input data points

class nahiku.gp_helpers.ParameterizedGPModel(kernel, mean)[source]

Bases: GP

Base GP model that allows for parameterized kernels and mean functions, without the need for training data at initialization.

forward(x)[source]

Compute the forward pass of the ParameterizedGPModel.

Parameters:: x (torch.Tensor) – Input data points

class nahiku.gp_helpers.QuasiPeriodicKernel(periodic_kernel=None, rbf_kernel=None, **kwargs)[source]

Bases: Kernel

A custom kernel that combines a periodic kernel with an RBF kernel to model quasi-periodic behavior in light curves.

forward(x1, x2, diag=False, **params)[source]

Compute the kernel value between two sets of inputs by combining the periodic and RBF kernels.

Parameters:

x1 (torch.Tensor) – First set of input points
x2 (torch.Tensor) – Second set of input points
diag (bool) – Whether to compute only diagonal elements

nahiku.greedy_search module

class nahiku.greedy_search.GreedySearch(x, y, dominant_period, device='cpu', which_grow_metric='mll', y_err=None, num_sigma_threshold=3, expansion_param=1, len_deviant=1)[source]

Bases: Search

This class implements a greedy search algorithm to identify anomalous intervals in a time series using Gaussian Processes.

Method:

Perform GP regression on the time series.
Find the most significant outlier interval (based on sum of residuals) of length len_deviant.
Exclude the outlier interval and redo regression. See if GP improves by some threshold.
Expand outlier interval in both directions by expansion_param and redo step 3.
Repeat step 4 as long as GP improves the fit by some threshold.
If no improvement, define anomaly signal as the difference between data and regression in the outlier interval of points.
Repeat steps 2-6 while there are still points above the num_sigma_threshold.

plot_greedy(x, pred_mean, left_edge, right_edge, residuals)[source]

Plot the light curve, GP fit, and detected anomalies at each iteration of the greedy search.

Parameters:

x (torch.Tensor) – x values used for GP prediction at the current iteration. Only needed to plot against pred_mean Every other plotting call will use self.x or self.x_orig
pred_mean (np.ndarray) – Array of the GP mean predictions corresponding to self.x at the current iteration
left_edge (int) – Left edge index of the currently flagged anomalous region
right_edge (int) – Right edge index of the currently flagged anomalous region
residuals (np.ndarray) – Array of residuals (absolute value of observed - predicted) for the current iteration

search_for_anomaly(refit=False, neg_anomaly_only=False, pos_anomaly_only=False, plot=False, detection_range=None, update_threshold=False, **kwargs)[source]

Main function to perform the greedy search for anomalies in the time series data.

Parameters:

refit (bool) – Whether to refit the GP model at each iteration of the greedy search (default: True)
neg_anomaly_only (bool) – Whether to only flag negative anomalies (i.e., dips) instead of both positive and negative anomalies (default: False)
pos_anomaly_only (bool) – Whether to only flag positive anomalies (i.e., flares) instead of both positive and negative anomalies (default: False)
plot (bool) – Whether to the light curve, GP fit, and detected anomalies at each iteration of the greedy search (default: False)
detection_range (tuple or None) – Tuple specifying the range of x values to consider for anomaly detection. If None, considers the entire range of x. Default is None.
update_threshold (bool) – Whether to update the num_sigma_threshold after each detected anomaly
kwargs – Additional keyword arguments to pass to the GP training function, such as training_iterations, lr, early_stopping, etc.

nahiku.nahiku module

class nahiku.nahiku.Nahiku(time, flux, anomalies=None, prominence=50, plot_dominant_period=False)[source]

Bases: object

This class represents a light curve and provides methods for anomaly detection using greedy and exhaustive search, as well as methods for plotting, standardizing, prewhitening, calculating the dominant period, injecting anomalies, and checking the accuracy of identified anomalies against true and injected anomalies.

check_identified_anomalies(buffer=5)[source]

Check the identified anomalies against the true and injected anomalies, and print out the results.

Parameters:: buffer (int) – number of indices on either side of the true and injected anomaly indices to consider as a match for an identified anomaly (default: 5)

exhaustive_search(**kwargs)[source]: Initialize and run a ExhaustiveSearch. Accepts all arguments for ExhaustiveSearch.__init__ and ExhaustiveSearch.search_for_anomaly.

static freq_idx_to_period_days(freqs_idx, times)[source]

Function to convert frequency indices from a periodogram to periods in days, using the time points of the original data to calculate the scaling factor for the conversion.

Parameters:

freqs_idx (1D array-like) – Array of frequency indices to convert to periods in days.
times (1D array-like) – Array of time points corresponding to the original data, used to calculate the scaling factor for converting frequency indices to periods in days.

static from_lightkurve(**kwargs)[source]

Load a light curve using the lightkurve.search_lightcurve function, with options to specify various parameters for the search. Documentation for lightkurve.search_lightcurve parameters can be found here: https://lightkurve.github.io/lightkurve/reference/api/lightkurve.search_lightcurve.html

Parameters:: kwargs – Additional keyword arguments to pass to lightkurve.search_lightcurve and Nahiku.__init__

static from_synthetic_parameterized_gp(num_days=100, num_steps=1000, seed=48, add_high_residuals=False, device='cpu', mean_constant=None, outputscale=None, periodic_lengthscale=None, period=None, rbf_lengthscale=None, noise_std=None, num_high_residuals=None, mean_high_residuals=None, var_high_residuals=None, **kwargs)[source]

Sample a function from a Gaussian Process with a scaled quasi-periodic kernel, constant mean, and Gaussian likelihood, with options to specify various parameters for the kernel, mean function, likelihood noise, and number high residuals to add in.

Parameters:

num_days (float) – total duration of the light curve in days (default: 100)
num_steps (int) – number of time steps in the light curve (default: 1000)
seed (int) – random seed for reproducibility (default: 48)
add_high_residuals (bool) – whether to add high residuals to the sampled function to create more challenging anomalies (default: False)
device (str) – device to use for GP sampling, either “cpu” or “cuda” (default: “cpu”)
mean_constant (float or None) – constant value for the mean function. If not provided, randomly chosen between -1 and 1 (Optional)
outputscale (float or None) – output scale for the scaled quasi-periodic kernel. If not provided, randomly chosen between 0.1 and 10 (Optional)
periodic_lengthscale (float or None) – length scale for the periodic component of the quasi-periodic kernel. If not provided, randomly chosen between 0.5 and num_days/4 (Optional)
period (float or None) – period for the periodic component of the quasi-periodic kernel. If not provided, randomly chosen between 0.5 and num_days (Optional)
rbf_lengthscale (float or None) – length scale for the RBF component of the quasi-periodic kernel. If not provided, randomly chosen between 0.5 and num_days/2 (Optional)
noise_std (float or None) – standard deviation of the Gaussian noise to add to the sampled function. If not provided, randomly chosen between 0.1 and 1 (Optional)
num_high_residuals (int or None) – number of high residuals to add to the sampled function if add_high_residuals is True. If not provided, randomly chosen between 5 and 25 (Optional)
mean_high_residuals (float or None) – mean of the Gaussian distribution to sample the high residuals from if add_high_residuals is True. If not provided, randomly chosen between -1 and 1 (Optional)
var_high_residuals (float or None) – variance of the Gaussian distribution to sample the high residuals from if add_high_residuals is True. If not provided, randomly chosen between 0.1 and 10 (Optional)
kwargs – Additional keyword arguments to pass to Nahiku.__init__

static from_synthetic_parameterized_noise(num_days=100, num_steps=1000, seed=48, rednoise_amp=1.0, whitenoise_amp=1.0, period=None, phase=None, amp=None, slope=None, rednoise_time_scale=None, random_noise_step_loc=None, **kwargs)[source]

Generate a synthetic light curve with a sinusoidal signal, white noise, red noise, a step function anomaly, and a linear trend.

Parameters:

num_days (float) – total duration of the light curve in days (default: 100)
num_steps (int) – number of time steps in the light curve (default: 1000)
seed (int) – random seed for reproducibility (default: 48)
rednoise_amp (float) – amplitude of red noise (default: 1.0)
whitenoise_amp (float) – amplitude of white noise (default: 1.0)
period (float) – period of the sinusoidal signal (default: randomly chosen between 175 and 225)
phase (float) – phase of the sinusoidal signal (default: randomly chosen between 0 and 2*pi)
amp (float) – amplitude of the sinusoidal signal (default: randomly chosen between 0 and 0.9)
slope (float) – slope of the linear trend (default: randomly chosen between -0.001 and 0.001)
rednoise_time_scale (float) – correlation time scale of the red noise (default: randomly chosen between 5 and 15)
random_noise_step_loc (float) – location of the step function anomaly (default: randomly chosen between 0 and num_steps)
kwargs – Additional keyword arguments to pass to Nahiku.__init__

get_dominant_period(prominence=50, plot=False)[source]

Function to calculate the dominant period of a light curve using the periodogram and peak detection. It also includes an option to plot the periodogram and the light curve with the dominant period sinusoid.

Parameters:

prominence (int) – minimum prominence of peaks to consider in the periodogram (default: 50)
plot (bool) – whether to plot the periodogram and light curve (default: False)

static get_events(indices)[source]

Helper to turn a list of indices into a list of (start, end) tuples.

Parameters:: indices (list of int) – list of indices to group into events

greedy_search(**kwargs)[source]: Initialize and run a GreedySearch. Accepts all arguments for GreedySearch.__init__ and GreedySearch.search_for_anomaly.

inject_anomaly(num_anomalies, absolute_width=None, absolute_depth=None, idxs=None, seed=48, shapes=['gaussian', 'saw', 'exocomet'], period_scale=None, snr=None, alpha=1)[source]

Inject an anomaly into the light curve, with options to specify the number of anomalies, their shapes, widths, depths, and locations.

Parameters:

num_anomalies (int) – number of anomalies to inject
absolute_width (float or None) – absolute width of the anomaly. If specified, period_scale is ignored (default: None)
absolute_depth (float or None) – absolute depth of the anomaly. If specified, snr is ignored (default: None)
idxs (list of float or None) – list of indices to inject anomalies at. If None, indices are randomly chosen (default: None)
seed (int) – random seed for reproducibility (default: 48)
shapes (list of str) – list of shapes to choose from for the anomalies. Options are “gaussian” for gaussian-shaped anomalies, “saw” for sawtooth-shaped anomalies, and “exocomet” for exocomet-shaped anomalies. Default is [“gaussian”, “saw”, “exocomet”].
period_scale (float or None) – ratio of the dominant period to use as the width of the anomaly. If None, randomly chosen between 0.1 and 5 (default: None)
snr (float or None) – signal to noise ratio of the anomaly. If None, randomly chosen between 0.5 and 10 (default: None)
alpha (float) – shape parameter for the exocomet profile, which controls the asymmetry of the anomaly. Higher values of alpha result in a more asymmetric profile with a steeper ingress and a shallower egress (default: 1)

plot(show_identified_points=True)[source]

Plot the light curve with shaded regions for injected/true anomalies and optional red x’s for identified anomalies.

Parameters:: show_identified_points (bool) – whether to plot the identified anomalous points as red x’s (default: True)

prewhiten(plot=False, **kwargs)[source]

Prewhiten a light curve using the balmung.prewhiten function, with options to specify various parameters for the removal of frequencies. Code for balmung.prewhiten can be found here: https://github.com/danhey/balmung/blob/master/balmung/balmung.py

Parameters:

plot (bool) – whether to plot the light curve before and after prewhitening (default: True)
kwargs – Additional keyword arguments to pass to balmung.prewhiten and Nahiku.__init__

standardize()[source]: Function to standardize the flux of the light curve by subtracting the mean and dividing by the standard deviation. This is important for the periodogram calculation and GP modeling, as it ensures that the data is on a consistent scale and that the periodogram is not dominated by the mean flux level.

nahiku.search module

class nahiku.search.Search(x, y, dominant_period, device='cpu')[source]

Bases: object

Base class for anomaly detection in light curves using Gaussian Processes. This base class provides common functionality for different search strategies (e.g., greedy, exhaustive) and build methods for constructing and optimizing a Gaussian Process model with a quasi-periodic kernel on the x and y data of a light curve.

build_constraints()[source]: Build the constraints for the GP model parameters.

build_gp_model(x=None, y=None, kernel=None, likelihood=None, mean=None, device=None)[source]

Initialize an ExactGPModel with the defined kernel, likelihood, and mean function.

Parameters:

kernel (GPytorch.kernel object) – kernel to use for the GP model (optional)
likelihood (GPytorch.likelihood object) – likelihood to use for the GP model (optional)
mean (GPytorch.mean object) – mean function to use for the GP model (optional)

build_kernel()[source]: Build the kernel for the GP model based on the dominant period of the light curve. Creates a Quasi-Periodic Kernel with constraints on the period length and lengthscales to guide the optimization process, and wraps it in a ScaleKernel to allow for scaling of the overall kernel output.

build_likelihood()[source]: Build the likelihood for the GP model.

build_mean()[source]: Build the mean function for the GP model.

plot()[source]: Plot two subplots: the original timeseries in black with identified anomalies plotted in red, and beside it the timeseries colored according to anomalous_signal with a colorbar

abstractmethod search_for_anomaly()[source]: All subclasses must implement this method.

train_gp(gp_model, likelihood, training_iterations=1000, lr=0.01, which_metric='mll', which_opt='adam', early_stopping=True, min_iterations=None, patience=1, plot=False, set_noise_equal_to_var_residuals=False, x=None, y=None, device=None)[source]

Train the GP model using the specified parameters and return the trained model, likelihood, and final log likelihood value.

Parameters:

training_iterations (int) – maximum number of training iterations (default: 1000)
lr (float) – learning rate for the optimizer (default: 0.01)
which_metric (str) – Metric to use for evaluating improvement during training. Options are ‘mll’ for marginal log likelihood and ‘mse’ for mean squared error. Default is ‘mll’.
which_opt (str) – Optimizer to use for training. Options are ‘adam’ and ‘sgd’. Default is ‘adam’.
early_stopping (bool) – Whether to use early stopping based on the training loss (default: True)
min_iterations (int or None) – Minimum number of iterations to train before considering early stopping (default: None, which sets it to training_iterations // 10)
patience (int) – Number of consecutive iterations with increasing loss to wait before stopping when early_stopping is True (default: 1)
plot (bool) – Whether to plot the training loss and covariance matrices after training (default: False)
set_noise_equal_to_var_residuals (bool) – Whether to set the likelihood noise variance equal to the variance of the residuals after training (default: False)
x (torch.Tensor or None) – Training input data (optional, defaults to self.x_tensor)
y (torch.Tensor or None) – Training target data (optional, defaults to self.y_tensor)
device (str or None) – Device to use for training (optional, defaults to self.device)

Module contents

nahiku.hello() → str[source]