Quickstart#

0. Installation#

pip install limulus

pandas support requires pip install limulus[pandas].

1. Prepare Input Data#

To explore basic operations, let’s start with the well-known Iris flower dataset.
Standard input formats are pyarrow.Table or polars.DataFrame (pandas.DataFrame is an optional dependency).

import pandas as pd

iris_data = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

2. Load into a Session#

limulus manages data through a Session object — similar to the concept of a work library.
After creating a Session, call loads() to register your data so it can be used in Data Steps.

import limulus

session = limulus.Session()
session.loads({"iris": iris_data})

3. Run a DATA Step#

Data Step code is executed with submit() (or run).
Reference datasets by table name only, or as work.tablename.
Multiple DATA steps can be submitted at once; tables created by earlier steps can be referenced by subsequent steps.

session.submit("""
data iris1 ;
  set iris;
  where petal_length > 1.5 ;
run;

data work.iris2 ;
  set iris1;
run;
""")

4. Retrieving and Working with Datasets#

limulus uses Arrow as its data exchange format.
session[table] retrieves an Arrow table.
From session.dataset(table), you can perform conversions, sorting, and more.

ar_table = session["iris1"]
pl_table = session.dataset("iris").to_polars()

print(pl_table.head(5))
shape: (5, 5)
┌──────────────┬─────────────┬──────────────┬─────────────┬─────────┐
│ sepal_length ┆ sepal_width ┆ petal_length ┆ petal_width ┆ species │
│ ---          ┆ ---         ┆ ---          ┆ ---         ┆ ---     │
│ f64          ┆ f64         ┆ f64          ┆ f64         ┆ str     │
╞══════════════╪═════════════╪══════════════╪═════════════╪═════════╡
│ 5.1          ┆ 3.5         ┆ 1.4          ┆ 0.2         ┆ setosa  │
│ 4.9          ┆ 3.0         ┆ 1.4          ┆ 0.2         ┆ setosa  │
│ 4.7          ┆ 3.2         ┆ 1.3          ┆ 0.2         ┆ setosa  │
│ 4.6          ┆ 3.1         ┆ 1.5          ┆ 0.2         ┆ setosa  │
│ 5.0          ┆ 3.6         ┆ 1.4          ┆ 0.2         ┆ setosa  │
└──────────────┴─────────────┴──────────────┴─────────────┴─────────┘

5. Execution Logs#

When execution succeeds, no log is printed by default. If you want to inspect logs explicitly (even after success), use session.log (or session.get_log()) to get a log element, or call print_log to display them.

When execution fails, an error log is printed automatically.

session.log.print_log()


# failure case
session.submit("""
data iris3 ;
  set dummy;
run;
""")
success: True
0.02 seconds elapsed
(no log entries)
success: False
0.0 seconds elapsed
Error
 [stage: resolve inputs]: Input dataset is not provided: dummy

6. Removing Data#

Use unload to remove data that is no longer needed.

session.unload(["iris1"])
True

One-Shot Execution#

If you want to run once without using a session, you can use limulus.submit().
Retrieve the resulting dataset with result.datasets[key].

rc = limulus.submit(
    "data iris1; set src_iris; where petal_length > 1.5; run;",
    src_iris=iris_data
    )

print(rc.datasets["iris1"].slice(0, 5)) 
pyarrow.Table
sepal_length: double
sepal_width: double
petal_length: double
petal_width: double
species: string
----
sepal_length: [[5.4,4.8,5.7,5.4,5.1]]
sepal_width: [[3.9,3.4,3.8,3.4,3.3]]
petal_length: [[1.7,1.6,1.7,1.7,1.7]]
petal_width: [[0.4,0.2,0.3,0.2,0.5]]
species: [["setosa","setosa","setosa","setosa","setosa"]]