API Reference#
Public API for using limulus from Python.
Top-Level Functions#
Convenience functions that can be executed as one-shot operations without creating a session.
- limulus.submit(code, *, backend='auto', **datasets)[source]#
One-shot execution (when session management is not needed).
- Parameters:
- Return type:
Examples
>>> import limulus, pyarrow as pa >>> result = limulus.submit("data out; set inp; run;", inp=pa.table({"x": [1, 2]})) >>> result.success True
Session#
The main class responsible for dataset management and Data Step execution.
- class limulus.Session(*, backend='auto', runtime_backend=None, parser_backend='python', options=None)[source]#
Bases:
object- Exclude-members:
select, filter
- apply_dataset_options(source, *, keep=None, drop=None, rename=None, where=None, out=None, target=None)[source]#
Applies
keep/drop/rename/wherein a single call.Multiple operations can be performed in one call. Application order:
keep→drop→where→rename.- Parameters:
source (
str) – Source dataset name.keep (
Sequence[str] |None) – List of column names to retain.drop (
Sequence[str] |None) – List of column names to remove.rename (
Mapping[str,str] |None) – A{old_name: new_name}mapping dict.where (
str|None) – Filter expression (same format aswhere()).out (
str|None) – Output dataset name. If omitted, overwritessource.
- Return type:
- Returns:
self(for method chaining).
- cast(source, mapping, out=None, *, target=None)[source]#
Alias for dataset-scoped column casting.
This is a convenience alias for the DatasetView-style
DatasetView.astype()workflow and forwards to the same internal implementation.- Return type:
- dataset(name)[source]#
Returns an operation view for the specified dataset.
Use this when you want to chain
keep()/drop()/where()/rename()/sort()calls.- Parameters:
name (
str) – Name of the dataset to create a view for.- Return type:
- Returns:
A
DatasetViewinstance.
Examples
>>> df = session.dataset("bmi").where("bmi > 20").keep(["name", "bmi"]).to_pandas()
- property datasets: DatasetCatalog#
Returns the session catalog (
DatasetCatalog).Supports dict-like access (
session.datasets["name"]),inoperator, and iteration.Note
session["name"]is syntactic sugar forsession.datasets["name"].
- drop(source, columns, out=None, *, target=None)[source]#
Removes the specified columns from a dataset (equivalent to DROP).
- filter(source, expression, out=None, *, target=None)[source]#
Row filtering (alias for
where()).- Return type:
- get_log()[source]#
Most recent
SubmitResult.This method is equivalent to
logand is provided for cases where method-style access is preferred.Returns
Noneuntil the firstsubmit()/run()call.- Return type:
- get_option(name=None, default=None)[source]#
Returns session options as a value or dictionary.
When
nameis omitted, returns a copy of all options. When a list of keys is given, returns a dictionary for those keys. When a single key is given, returns its value ordefault.- Return type:
- include(path)[source]#
Reads a DSL file and executes it via
submit().- Parameters:
path (
str) – Path to a UTF-8 encoded Data Step script.- Return type:
- Returns:
The resulting
SubmitResult.
- keep(source, columns, out=None, *, target=None)[source]#
Retains only the specified columns from a dataset (equivalent to KEEP).
- Parameters:
- Return type:
- Returns:
self(for method chaining).
Examples
>>> session.keep("bmi", ["name", "sex", "bmi"]) >>> session.keep("bmi", ["name", "bmi"], out="bmi_slim")
- load(name, data)[source]#
Registers a single dataset in the session catalog.
- Parameters:
- Return type:
Examples
>>> import limulus, pyarrow as pa >>> session = limulus.Session() >>> session.load("mydata", pa.table({"x": [1, 2, 3]}))
- loads(datasets=None, **named_datasets)[source]#
Registers multiple datasets at once.
- Parameters:
- Return type:
Examples
>>> session.loads({"a": df_a, "b": df_b}) >>> session.loads(a=df_a, b=df_b) # also accepted as keyword arguments
- property log: SubmitResult | None#
Most recent
SubmitResult.
- rename(source, mapping, out=None, *, target=None)[source]#
Renames columns (equivalent to RENAME).
- Parameters:
- Return type:
- Returns:
self(for method chaining).
Examples
>>> session.rename("bmi", {"height_m": "height_meter"})
- select(source, columns, out=None, *, target=None)[source]#
Column selection (alias for
keep()).- Return type:
- set_option(options)[source]#
Merges session options from a dictionary.
These options are currently reserved for future use, but are forwarded to the executor on each
submit()call.- Return type:
- sort(source, by, out=None, *, nodupkey=False, target=None)[source]#
Sorts a dataset by columns.
- Parameters:
source (
str) – Source dataset name.by (
Sequence[str]) – List of column names to sort by. Can also be a list of(column_name, "ascending" | "descending")tuples to specify direction.out (
str|None) – Output dataset name. If omitted, overwritessource.nodupkey (
bool) – WhenTrue, keeps only the first row for each unique key defined bybyafter sorting.
- Return type:
- Returns:
self(for method chaining).
Examples
>>> session.sort("class", "age") >>> session.sort("class", ["age", "name"]) >>> session.sort("class", [("age", "descending")]) >>> session.sort("class", ["age"], nodupkey=True)
- sql(query, out=None, *, target=None)[source]#
Executes SQL against session datasets.
- Parameters:
- Returns:
A
pyarrow.Tablecontaining the query result.
SQL execution is backed by the Polars SQL engine. https://docs.pola.rs/api/python/stable/reference/sql/index.html If the SQL starts with
CREATE TABLE name AS ..., the result is also stored in the session catalog undername.
- submit(code, *, backend=None, show_result=False)[source]#
Submits Data Step code to the session and executes it.
Resulting datasets are added to or updated in the session catalog. Datasets created by a previous
submit()call can be referenced in subsequent calls.- Parameters:
code (
str) – Data Step DSL text to execute. MultipleDATA ... RUN;blocks may be included.backend (
str|None) – Optional runtime backend override for this call only. One of"rust","python", or"auto". When given, temporarily overrides the session-level backend setting.show_result (
bool) – Controls notebook/REPL representation of the returnedSubmitResult. Defaults toFalseforSession.submit()to keep successful calls quiet.
- Return type:
- Returns:
A
SubmitResult. Checkresult.successto determine whether errors occurred.
Examples
>>> rc = session.submit(""" ... data result; ... set mydata; ... bmi = round(weight / height**2, 0.1); ... keep name bmi; ... run; ... """) >>> print(session["result"].to_pandas())
- to_arrow(name)[source]#
Returns a dataset as a
pyarrow.Table.- Parameters:
name (
str) – Dataset name.- Returns:
pyarrow.Table.
- to_pandas(name)[source]#
Converts a dataset to a
pandas.DataFrameand returns it.- Parameters:
name (
str) – Dataset name.- Returns:
pandas.DataFrame.
- to_polars(name)[source]#
Converts a dataset to a
polars.DataFrameand returns it.- Parameters:
name (
str) – Dataset name.- Returns:
polars.DataFrame.
- unload(*names, missing_ok=True)[source]#
Removes one or more datasets from the session.
- Parameters:
- Return type:
- Returns:
Trueif at least one dataset was removed.
Examples
>>> session.unload("tmp") >>> session.unload("a", "b") >>> session.unload(["a", "b"])
- where(source, expression, out=None, *, target=None)[source]#
Filters rows using a simple comparison expression (equivalent to WHERE).
Supported expression format:
"column op value"(e.g."age > 13","sex = 'M'”).- Parameters:
- Return type:
- Returns:
self(for method chaining).- Raises:
ValueError – If the expression format is invalid.
Examples
>>> session.where("class", "age > 13") >>> session.where("class", "sex = 'M'", out="male")
DatasetView#
A view class returned by :meth:Session.dataset for chained operations.
- class limulus.session.DatasetView(session, name)[source]#
Bases:
objectA view for chained operations on a dataset.
Obtained via
Session.dataset(). You can chainselect()/keep()/drop()/where()/rename()/sort()calls.Examples
>>> view = session.dataset("bmi") >>> df = view.where("bmi > 20").keep(["name", "bmi"]).to_pandas()
- apply_options(*, keep=None, drop=None, rename=None, where=None, out=None)[source]#
Applies
keep/drop/rename/wherein a single call.- Parameters:
- Return type:
- Returns:
A
DatasetViewfor the resulting dataset.
- astype(mapping, out=None)[source]#
Casts columns using a
{column: dtype}mapping.- Parameters:
- Return type:
- Returns:
A
DatasetViewfor the resulting dataset.
Note
Column names in this column-oriented API are currently case-sensitive.
- drop(columns, out=None)[source]#
Removes the specified columns (equivalent to DROP).
- Parameters:
- Return type:
- Returns:
A
DatasetViewfor the resulting dataset.
- keep(columns, out=None)[source]#
Retains only the specified columns (equivalent to KEEP).
- Parameters:
- Return type:
- Returns:
A
DatasetViewfor the resulting dataset.
- rename(mapping, out=None)[source]#
Renames columns (equivalent to RENAME).
- Parameters:
- Return type:
- Returns:
A
DatasetViewfor the resulting dataset.
- select(columns, out=None)[source]#
Retains only the specified columns.
- Parameters:
- Return type:
- Returns:
A
DatasetViewfor the resulting dataset.
- sort(by, out=None, *, nodupkey=False)[source]#
Sorts the dataset by columns.
- Parameters:
by (
Sequence[str]) – List of column names to sort by. Can also be a list of(column_name, "ascending" | "descending")tuples to specify direction.out (
str|None) – Output dataset name. If omitted, overwrites this view’s dataset.nodupkey (
bool) – WhenTrue, keeps only the first row for each unique key defined bybyafter sorting.
- Return type:
- Returns:
A
DatasetViewfor the resulting dataset.
- where(expression, out=None)[source]#
Filters rows using a simple comparison expression (equivalent to WHERE).
- Parameters:
- Return type:
- Returns:
A
DatasetViewfor the resulting dataset.
SubmitResult#
The return value of :meth:Session.submit.
LogEntry#
A class representing a single entry in the execution log.