nahiku package

Submodules

nahiku.balmung module

This code originally developed by Dan Hey at https://github.com/danhey/balmung/blob/master/balmung/balmung.py This file is used with permission for the development of the Nahiku package.

class nahiku.balmung.Balmung(time: ndarray, flux: ndarray)[source]

Bases: object

amplitude_spectrum(t, y, fmin: float = None, fmax: float = None, oversample_factor: float = 5.0) tuple[source]

Calculates the amplitude spectrum at a given time and flux input

Parameters:
  • t (np.ndarray) – Time values

  • y (np.ndarray) – Flux values

  • fmin (float, optional) – Minimum frequency. Defaults to None.

  • fmax (float, optional) – Maximum frequency. Defaults to None.

  • oversample_factor (float, optional) – Amount by which to oversample the light curve. Defaults to 5.0.

Returns:

Frequency and amplitude arrays

Return type:

tuple

dft_phase(x: ndarray, y: ndarray, f: float) float[source]

Calculates the phase at a single frequency using the Discrete Fourier Transform

Parameters:
  • x (np.ndarray) – Time values

  • y (np.ndarray) – Flux values

  • f (float) – Frequency

Returns:

Phase at given frequency

Return type:

float

estimate_background(x: ndarray, y: ndarray, log_width: float = 0.01) ndarray[source]

Estimates the background signal

Parameters:
  • x (np.ndarray) – [description]

  • y (np.ndarray) – [description]

  • log_width (float, optional) – [description]. Defaults to 0.01.

Returns:

[description]

Return type:

[type]

find_highest_peak(f: ndarray, a: ndarray) float[source]

Uses three point parabolic interpolation to find the highest peaks in the amplitude spectrum

Parameters:
  • f (np.ndarray) – Frequency values

  • a (np.ndarray) – Amplitude values

Returns:

Maximum frequency

Return type:

float

fit(theta: list) ndarray[source]

Small wrapper for curve_fit.

Parameters:
  • time (np.ndarray) – Time values

  • flux (np.ndarray) – Flux values

  • theta (list) – Array-like of initial guesses

Returns:

Fitted parameters

Return type:

list

grad_model(time: ndarray, freq: float, amp: float, phi: float) ndarray[source]

Gradient function of our pulsation model

Parameters:
  • time (np.ndarray) – Time values

  • freq (float) – Frequency

  • amp (float) – Amplitude

  • phi (float) – Phase

Returns:

Gradient vector (dModel/d_{freq,amp,phi})

Return type:

np.ndarray

initialize_guess(fmin: float, fmax: float)[source]
model(time: ndarray, freq: float, amp: float, phi: float) ndarray[source]

And at the heart of it all, a tiny model function.

Parameters:
  • time (np.ndarray) – Time values

  • freq (float) – Frequency

  • amp (float) – Amplitude

  • phi (float) – Phase

Returns:

Sinusoid at the given parameters

Return type:

np.ndarray

plot(ax=None)[source]
plot_lc(ax=None)[source]
plot_residual(ax=None)[source]
prewhiten(fmin=None, fmax=None, minimum_snr=0, maxiter=200, diagnose=True)[source]

nahiku.exhaustive_helpers module

nahiku.exhaustive_helpers.compute_interval_pvalue(y_true, mvn_pred)[source]

This function computes the Mahalanobis distance and corresponding p-value for a true observation under a predicted multivariate normal distribution. It attempts to compute the Mahalanobis distance using Cholesky decomposition for numerical stability, and falls back to direct solving if Cholesky fails (e.g., if the covariance is not positive definite). The p-value is computed based on the chi-squared distribution with degrees of freedom equal to the dimensionality of the data.

Parameters:
  • y_true (torch.Tensor) – shape (d,)

  • mvn_pred (MultivariateNormal) – MultivariateNormal from GPyTorch (mean: (d,), covar: (d,d))

nahiku.exhaustive_helpers.interval_posterior_from_precision(mu, J, full_y, mask_train, mask_test, dtype=torch.float64)[source]

Compute the posterior distribution p(y_test | y_train) using precision matrix J and mean mu. This function computes the conditional posterior distribution over test points given training data, using the precision (inverse covariance) matrix and the prior mean. It leverages the block structure of the precision matrix to efficiently compute the conditional distribution.

Parameters:
  • mu (torch.Tensor) – (N,) mean vector over the full grid

  • J (torch.Tensor) – (N, N) precision matrix over the full grid, where J = (K + σ²I)^{-1}

  • full_y (torch.Tensor) – (N,) output values on the full grid

  • mask_train (torch.Tensor or bool array) – (N,) boolean mask indicating training data points

  • mask_test (torch.Tensor or bool array) – (N,) boolean mask indicating test data points

  • dtype (torch.dtype) – Data type for computations (default: torch.float64)

nahiku.exhaustive_helpers.precompute_precision(full_x, mean_module, kernel_module, noise_variance, dtype=torch.float64, device='cpu')[source]

Build μ and J = (K + σ²I)^{-1} for the full X once. Use float64 here for numerical stability; you can cast later if desired.

Parameters:
  • full_x (torch.Tensor) – (N, D) tensor of all input locations

  • mean_module (GPyTorch mean module) – GPyTorch mean module that can be called on full_x

  • kernel_module (GPyTorch kernel module) – GPyTorch kernel module that can be called on full_x to produce a LazyTensor covariance

  • noise_variance (float) – observation noise variance σ²

  • dtype (torch.dtype) – torch dtype for computations (default: torch.float64 for stability)

  • device (str) – torch device for computations (default: “cpu”)

nahiku.gp_helpers module

class nahiku.gp_helpers.ExactGPModel(train_x, train_y, likelihood, kernel, mean)[source]

Bases: ExactGP

Initialize the ExactGPModel with a specified kernel and mean function, and define the forward method to compute the GP output.

forward(x)[source]

Compute the forward pass of the ExactGPModel.

Parameters:

x (torch.Tensor) – Input data points

class nahiku.gp_helpers.ParameterizedGPModel(kernel, mean)[source]

Bases: GP

Base GP model that allows for parameterized kernels and mean functions, without the need for training data at initialization.

forward(x)[source]

Compute the forward pass of the ParameterizedGPModel.

Parameters:

x (torch.Tensor) – Input data points

class nahiku.gp_helpers.QuasiPeriodicKernel(periodic_kernel=None, rbf_kernel=None, **kwargs)[source]

Bases: Kernel

A custom kernel that combines a periodic kernel with an RBF kernel to model quasi-periodic behavior in light curves.

forward(x1, x2, diag=False, **params)[source]

Compute the kernel value between two sets of inputs by combining the periodic and RBF kernels.

Parameters:
  • x1 (torch.Tensor) – First set of input points

  • x2 (torch.Tensor) – Second set of input points

  • diag (bool) – Whether to compute only diagonal elements

nahiku.nahiku module

class nahiku.nahiku.Nahiku(time, flux, anomalies=None, prominence=50, plot_dominant_period=False)[source]

Bases: object

This class represents a light curve and provides methods for anomaly detection using greedy and exhaustive search, as well as methods for plotting, standardizing, prewhitening, calculating the dominant period, injecting anomalies, and checking the accuracy of identified anomalies against true and injected anomalies.

check_identified_anomalies(buffer=5)[source]

Check the identified anomalies against the true and injected anomalies, and print out the results.

Parameters:

buffer (int) – number of indices on either side of the true and injected anomaly indices to consider as a match for an identified anomaly (default: 5)

Initialize and run a ExhaustiveSearch. Accepts all arguments for ExhaustiveSearch.__init__ and ExhaustiveSearch.search_for_anomaly.

static freq_idx_to_period_days(freqs_idx, times)[source]

Function to convert frequency indices from a periodogram to periods in days, using the time points of the original data to calculate the scaling factor for the conversion.

Parameters:
  • freqs_idx (1D array-like) – Array of frequency indices to convert to periods in days.

  • times (1D array-like) – Array of time points corresponding to the original data, used to calculate the scaling factor for converting frequency indices to periods in days.

static from_lightkurve(**kwargs)[source]

Load a light curve using the lightkurve.search_lightcurve function, with options to specify various parameters for the search. Documentation for lightkurve.search_lightcurve parameters can be found here: https://lightkurve.github.io/lightkurve/reference/api/lightkurve.search_lightcurve.html

Parameters:

kwargs – Additional keyword arguments to pass to lightkurve.search_lightcurve and Nahiku.__init__

static from_synthetic_parameterized_gp(num_days=100, num_steps=1000, seed=48, add_high_residuals=False, device='cpu', mean_constant=None, outputscale=None, periodic_lengthscale=None, period=None, rbf_lengthscale=None, noise_std=None, num_high_residuals=None, mean_high_residuals=None, var_high_residuals=None, **kwargs)[source]

Sample a function from a Gaussian Process with a scaled quasi-periodic kernel, constant mean, and Gaussian likelihood, with options to specify various parameters for the kernel, mean function, likelihood noise, and number high residuals to add in.

Parameters:
  • num_days (float) – total duration of the light curve in days (default: 100)

  • num_steps (int) – number of time steps in the light curve (default: 1000)

  • seed (int) – random seed for reproducibility (default: 48)

  • add_high_residuals (bool) – whether to add high residuals to the sampled function to create more challenging anomalies (default: False)

  • device (str) – device to use for GP sampling, either “cpu” or “cuda” (default: “cpu”)

  • mean_constant (float or None) – constant value for the mean function. If not provided, randomly chosen between -1 and 1 (Optional)

  • outputscale (float or None) – output scale for the scaled quasi-periodic kernel. If not provided, randomly chosen between 0.1 and 10 (Optional)

  • periodic_lengthscale (float or None) – length scale for the periodic component of the quasi-periodic kernel. If not provided, randomly chosen between 0.5 and num_days/4 (Optional)

  • period (float or None) – period for the periodic component of the quasi-periodic kernel. If not provided, randomly chosen between 0.5 and num_days (Optional)

  • rbf_lengthscale (float or None) – length scale for the RBF component of the quasi-periodic kernel. If not provided, randomly chosen between 0.5 and num_days/2 (Optional)

  • noise_std (float or None) – standard deviation of the Gaussian noise to add to the sampled function. If not provided, randomly chosen between 0.1 and 1 (Optional)

  • num_high_residuals (int or None) – number of high residuals to add to the sampled function if add_high_residuals is True. If not provided, randomly chosen between 5 and 25 (Optional)

  • mean_high_residuals (float or None) – mean of the Gaussian distribution to sample the high residuals from if add_high_residuals is True. If not provided, randomly chosen between -1 and 1 (Optional)

  • var_high_residuals (float or None) – variance of the Gaussian distribution to sample the high residuals from if add_high_residuals is True. If not provided, randomly chosen between 0.1 and 10 (Optional)

  • kwargs – Additional keyword arguments to pass to Nahiku.__init__

static from_synthetic_parameterized_noise(num_days=100, num_steps=1000, seed=48, rednoise_amp=1.0, whitenoise_amp=1.0, period=None, phase=None, amp=None, slope=None, rednoise_time_scale=None, random_noise_step_loc=None, **kwargs)[source]

Generate a synthetic light curve with a sinusoidal signal, white noise, red noise, a step function anomaly, and a linear trend.

Parameters:
  • num_days (float) – total duration of the light curve in days (default: 100)

  • num_steps (int) – number of time steps in the light curve (default: 1000)

  • seed (int) – random seed for reproducibility (default: 48)

  • rednoise_amp (float) – amplitude of red noise (default: 1.0)

  • whitenoise_amp (float) – amplitude of white noise (default: 1.0)

  • period (float) – period of the sinusoidal signal (default: randomly chosen between 175 and 225)

  • phase (float) – phase of the sinusoidal signal (default: randomly chosen between 0 and 2*pi)

  • amp (float) – amplitude of the sinusoidal signal (default: randomly chosen between 0 and 0.9)

  • slope (float) – slope of the linear trend (default: randomly chosen between -0.001 and 0.001)

  • rednoise_time_scale (float) – correlation time scale of the red noise (default: randomly chosen between 5 and 15)

  • random_noise_step_loc (float) – location of the step function anomaly (default: randomly chosen between 0 and num_steps)

  • kwargs – Additional keyword arguments to pass to Nahiku.__init__

get_dominant_period(prominence=50, plot=False)[source]

Function to calculate the dominant period of a light curve using the periodogram and peak detection. It also includes an option to plot the periodogram and the light curve with the dominant period sinusoid.

Parameters:
  • prominence (int) – minimum prominence of peaks to consider in the periodogram (default: 50)

  • plot (bool) – whether to plot the periodogram and light curve (default: False)

static get_events(indices)[source]

Helper to turn a list of indices into a list of (start, end) tuples.

Parameters:

indices (list of int) – list of indices to group into events

Initialize and run a GreedySearch. Accepts all arguments for GreedySearch.__init__ and GreedySearch.search_for_anomaly.

inject_anomaly(num_anomalies, absolute_width=None, absolute_depth=None, idxs=None, seed=48, shapes=['gaussian', 'saw', 'exocomet'], period_scale=None, snr=None, alpha=1)[source]

Inject an anomaly into the light curve, with options to specify the number of anomalies, their shapes, widths, depths, and locations.

Parameters:
  • num_anomalies (int) – number of anomalies to inject

  • absolute_width (float or None) – absolute width of the anomaly. If specified, period_scale is ignored (default: None)

  • absolute_depth (float or None) – absolute depth of the anomaly. If specified, snr is ignored (default: None)

  • idxs (list of float or None) – list of indices to inject anomalies at. If None, indices are randomly chosen (default: None)

  • seed (int) – random seed for reproducibility (default: 48)

  • shapes (list of str) – list of shapes to choose from for the anomalies. Options are “gaussian” for gaussian-shaped anomalies, “saw” for sawtooth-shaped anomalies, and “exocomet” for exocomet-shaped anomalies. Default is [“gaussian”, “saw”, “exocomet”].

  • period_scale (float or None) – ratio of the dominant period to use as the width of the anomaly. If None, randomly chosen between 0.1 and 5 (default: None)

  • snr (float or None) – signal to noise ratio of the anomaly. If None, randomly chosen between 0.5 and 10 (default: None)

  • alpha (float) – shape parameter for the exocomet profile, which controls the asymmetry of the anomaly. Higher values of alpha result in a more asymmetric profile with a steeper ingress and a shallower egress (default: 1)

plot(show_identified_points=True)[source]

Plot the light curve with shaded regions for injected/true anomalies and optional red x’s for identified anomalies.

Parameters:

show_identified_points (bool) – whether to plot the identified anomalous points as red x’s (default: True)

prewhiten(plot=False, **kwargs)[source]

Prewhiten a light curve using the balmung.prewhiten function, with options to specify various parameters for the removal of frequencies. Code for balmung.prewhiten can be found here: https://github.com/danhey/balmung/blob/master/balmung/balmung.py

Parameters:
  • plot (bool) – whether to plot the light curve before and after prewhitening (default: True)

  • kwargs – Additional keyword arguments to pass to balmung.prewhiten and Nahiku.__init__

standardize()[source]

Function to standardize the flux of the light curve by subtracting the mean and dividing by the standard deviation. This is important for the periodogram calculation and GP modeling, as it ensures that the data is on a consistent scale and that the periodogram is not dominated by the mean flux level.

nahiku.search module

class nahiku.search.Search(x, y, dominant_period, device='cpu')[source]

Bases: object

Base class for anomaly detection in light curves using Gaussian Processes. This base class provides common functionality for different search strategies (e.g., greedy, exhaustive) and build methods for constructing and optimizing a Gaussian Process model with a quasi-periodic kernel on the x and y data of a light curve.

build_constraints()[source]

Build the constraints for the GP model parameters.

build_gp_model(x=None, y=None, kernel=None, likelihood=None, mean=None, device=None)[source]

Initialize an ExactGPModel with the defined kernel, likelihood, and mean function.

Parameters:
  • kernel (GPytorch.kernel object) – kernel to use for the GP model (optional)

  • likelihood (GPytorch.likelihood object) – likelihood to use for the GP model (optional)

  • mean (GPytorch.mean object) – mean function to use for the GP model (optional)

build_kernel()[source]

Build the kernel for the GP model based on the dominant period of the light curve. Creates a Quasi-Periodic Kernel with constraints on the period length and lengthscales to guide the optimization process, and wraps it in a ScaleKernel to allow for scaling of the overall kernel output.

build_likelihood()[source]

Build the likelihood for the GP model.

build_mean()[source]

Build the mean function for the GP model.

plot()[source]

Plot two subplots: the original timeseries in black with identified anomalies plotted in red, and beside it the timeseries colored according to anomalous_signal with a colorbar

abstractmethod search_for_anomaly()[source]

All subclasses must implement this method.

train_gp(gp_model, likelihood, training_iterations=1000, lr=0.01, which_metric='mll', which_opt='adam', early_stopping=True, min_iterations=None, patience=1, plot=False, set_noise_equal_to_var_residuals=False, x=None, y=None, device=None)[source]

Train the GP model using the specified parameters and return the trained model, likelihood, and final log likelihood value.

Parameters:
  • training_iterations (int) – maximum number of training iterations (default: 1000)

  • lr (float) – learning rate for the optimizer (default: 0.01)

  • which_metric (str) – Metric to use for evaluating improvement during training. Options are ‘mll’ for marginal log likelihood and ‘mse’ for mean squared error. Default is ‘mll’.

  • which_opt (str) – Optimizer to use for training. Options are ‘adam’ and ‘sgd’. Default is ‘adam’.

  • early_stopping (bool) – Whether to use early stopping based on the training loss (default: True)

  • min_iterations (int or None) – Minimum number of iterations to train before considering early stopping (default: None, which sets it to training_iterations // 10)

  • patience (int) – Number of consecutive iterations with increasing loss to wait before stopping when early_stopping is True (default: 1)

  • plot (bool) – Whether to plot the training loss and covariance matrices after training (default: False)

  • set_noise_equal_to_var_residuals (bool) – Whether to set the likelihood noise variance equal to the variance of the residuals after training (default: False)

  • x (torch.Tensor or None) – Training input data (optional, defaults to self.x_tensor)

  • y (torch.Tensor or None) – Training target data (optional, defaults to self.y_tensor)

  • device (str or None) – Device to use for training (optional, defaults to self.device)

Module contents

nahiku.hello() str[source]