Quickstart#
0. Installation#
pip install limulus
pandas support requires pip install limulus[pandas].
1. Prepare Input Data#
To explore basic operations, let’s start with the well-known Iris flower dataset.
Standard input formats are pyarrow.Table or polars.DataFrame (pandas.DataFrame is an optional dependency).
import pandas as pd
iris_data = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
2. Load into a Session#
limulus manages data through a Session object — similar to the concept of a work library.
After creating a Session, call loads() to register your data so it can be used in Data Steps.
import limulus
session = limulus.Session()
session.loads({"iris": iris_data})
3. Run a DATA Step#
Data Step code is executed with submit() (or run).
Reference datasets by table name only, or as work.tablename.
Multiple DATA steps can be submitted at once; tables created by earlier steps can be referenced by subsequent steps.
session.submit("""
data iris1 ;
set iris;
where petal_length > 1.5 ;
run;
data work.iris2 ;
set iris1;
run;
""")
4. Retrieving and Working with Datasets#
limulus uses Arrow as its data exchange format.
session[table] retrieves an Arrow table.
From session.dataset(table), you can perform conversions, sorting, and more.
ar_table = session["iris1"]
pl_table = session.dataset("iris").to_polars()
print(pl_table.head(5))
shape: (5, 5)
┌──────────────┬─────────────┬──────────────┬─────────────┬─────────┐
│ sepal_length ┆ sepal_width ┆ petal_length ┆ petal_width ┆ species │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 ┆ str │
╞══════════════╪═════════════╪══════════════╪═════════════╪═════════╡
│ 5.1 ┆ 3.5 ┆ 1.4 ┆ 0.2 ┆ setosa │
│ 4.9 ┆ 3.0 ┆ 1.4 ┆ 0.2 ┆ setosa │
│ 4.7 ┆ 3.2 ┆ 1.3 ┆ 0.2 ┆ setosa │
│ 4.6 ┆ 3.1 ┆ 1.5 ┆ 0.2 ┆ setosa │
│ 5.0 ┆ 3.6 ┆ 1.4 ┆ 0.2 ┆ setosa │
└──────────────┴─────────────┴──────────────┴─────────────┴─────────┘
5. Execution Logs#
When execution succeeds, no log is printed by default. If you want to inspect logs explicitly (even after success), use session.log (or session.get_log()) to get a log element, or call print_log to display them.
When execution fails, an error log is printed automatically.
session.log.print_log()
# failure case
session.submit("""
data iris3 ;
set dummy;
run;
""")
success: True
0.02 seconds elapsed
(no log entries)
success: False
0.0 seconds elapsed
Error
[stage: resolve inputs]: Input dataset is not provided: dummy
6. Removing Data#
Use unload to remove data that is no longer needed.
session.unload(["iris1"])
True
One-Shot Execution#
If you want to run once without using a session, you can use limulus.submit().
Retrieve the resulting dataset with result.datasets[key].
rc = limulus.submit(
"data iris1; set src_iris; where petal_length > 1.5; run;",
src_iris=iris_data
)
print(rc.datasets["iris1"].slice(0, 5))
pyarrow.Table
sepal_length: double
sepal_width: double
petal_length: double
petal_width: double
species: string
----
sepal_length: [[5.4,4.8,5.7,5.4,5.1]]
sepal_width: [[3.9,3.4,3.8,3.4,3.3]]
petal_length: [[1.7,1.6,1.7,1.7,1.7]]
petal_width: [[0.4,0.2,0.3,0.2,0.5]]
species: [["setosa","setosa","setosa","setosa","setosa"]]