Quickstart#

0. Installation#

pip install limulus

pandas support requires pip install limulus[pandas].

1. Prepare Input Data#

To explore basic operations, let’s start with the well-known Iris flower dataset.
Standard input formats are pyarrow.Table or polars.DataFrame (pandas.DataFrame is an optional dependency).

import pandas as pd

iris_data = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

2. Load into a Session#

limulus manages data through a Session object — similar to the concept of a work library.
After creating a Session, call loads() to register your data so it can be used in Data Steps.

import limulus

session = limulus.Session()
session.loads({"iris": iris_data})

3. Run a DATA Step#

Data Step code is executed with submit() (or run).
Reference datasets by table name only, or as work.tablename.
Multiple DATA steps can be submitted at once; tables created by earlier steps can be referenced by subsequent steps.

session.submit("""
data iris1 ;
  set iris;
  where petal_length > 1.5 ;
run;

data work.iris2 ;
  set iris1;
run;
""")

4. Retrieving and Working with Datasets#

limulus uses Arrow as its data exchange format.
session[table] retrieves an Arrow table.
From session.dataset(table), you can perform conversions, sorting, and more.

ar_table = session["iris1"]
pl_table = session.dataset("iris").to_polars()

print(pl_table.head(5))

shape: (5, 5)
┌──────────────┬─────────────┬──────────────┬─────────────┬─────────┐
│ sepal_length ┆ sepal_width ┆ petal_length ┆ petal_width ┆ species │
│ ---          ┆ ---         ┆ ---          ┆ ---         ┆ ---     │
│ f64          ┆ f64         ┆ f64          ┆ f64         ┆ str     │
╞══════════════╪═════════════╪══════════════╪═════════════╪═════════╡
│ 5.1          ┆ 3.5         ┆ 1.4          ┆ 0.2         ┆ setosa  │
│ 4.9          ┆ 3.0         ┆ 1.4          ┆ 0.2         ┆ setosa  │
│ 4.7          ┆ 3.2         ┆ 1.3          ┆ 0.2         ┆ setosa  │
│ 4.6          ┆ 3.1         ┆ 1.5          ┆ 0.2         ┆ setosa  │
│ 5.0          ┆ 3.6         ┆ 1.4          ┆ 0.2         ┆ setosa  │
└──────────────┴─────────────┴──────────────┴─────────────┴─────────┘

5. Execution Logs#

When execution succeeds, no log is printed by default. If you want to inspect logs explicitly (even after success), use session.log (or session.get_log()) to get a log element, or call print_log to display them.

When execution fails, an error log is printed automatically.

session.log.print_log()


# failure case
session.submit("""
data iris3 ;
  set dummy;
run;
""")

success: True
0.01 seconds elapsed
(no log entries)
success: False
0.0 seconds elapsed
Error entries: 1
[RUNTIME_SET_DATASET_NOT_FOUND] Error: [stage: validate] Input dataset is not provided: dummy
   ╭─[ <dsl>:2:3 ]
   │
 2 │   set dummy;
   │   ────┬────  
   │       ╰────── missing dataset
───╯


Error[RUNTIME_SET_DATASET_NOT_FOUND] [stage: validate]: Input dataset is not provided: dummy
 --> <dsl>:2:3
  |
2 |   set dummy;
  |   ^^^^^^^^^ missing dataset

6. Removing Data#

Use unload to remove data that is no longer needed.

session.unload(["iris1"])

True

One-Shot Execution#

If you want to run once without using a session, you can use limulus.submit().
Retrieve the resulting dataset with result.datasets[key].

rc = limulus.submit(
    "data iris1; set src_iris; where petal_length > 1.5; run;",
    src_iris=iris_data
    )

print(rc.datasets["iris1"].to_pandas().head(5)) 

   sepal_length  sepal_width  petal_length  petal_width species
         5.4          3.9           1.7          0.4  setosa
         4.8          3.4           1.6          0.2  setosa
         5.7          3.8           1.7          0.3  setosa
         5.4          3.4           1.7          0.2  setosa
         5.1          3.3           1.7          0.5  setosa

Quickstart

Contents