Sentinel Data Lake Notebooks: A Step-by-Step Guide

Introduction

In Part 6, we explored what notebooks are and why they matter in Microsoft Sentinel Data Lake. Now it’s time to roll up our sleeves and see notebooks in action.

In this post, we’ll walk through how analysts can use Jupyter Notebooks powered by Spark directly inside the Sentinel Data Lake. Moreover, we will show how to query sign-in logs, parse location details, and preview results without managing clusters or infrastructure. These provide a seamless way to handle data.

📚 Reference: https://learn.microsoft.com/en-us/azure/sentinel/datalake/notebooks

1. The New Integration: Jupyter + Spark

Microsoft Sentinel Data Lake supports running Python notebooks on managed Spark compute through the VS Code Sentinel extension. This integration lets you utilize for tasks such as:

Use Python libraries such as PySpark and Pandas alongside Sentinel APIs.
Execute distributed Spark jobs for large-scale analysis.
Schedule notebooks as recurring jobs without cluster configuration.

2. Getting Started with Your First Spark Notebook

Step 1 – Launch from VS Code

First, install the Microsoft Sentinel extension in Visual Studio Code. Next, connect to your Sentinel Data Lake workspace and create a new notebook using Managed Spark compute. These tools facilitate working effectively with Sentinel Data Lake.

Step 2 – Connect to Data

Begin by using the built-in Sentinel provider to read tables and gain insights using Sentinel Data Lake Notebooks.

from sentinellake.providers import MicrosoftSentinelProvider
dataprovider = MicrosoftSentinelProvider(spark)

workspacename = "CyberSOC"  # Replace with your workspace
tablename = "SigninLogs"

df = dataprovider.readtable(tablename, workspacename)

Step 3 – Parse Location Details

Then, convert JSON fields into structured columns for easier analysis with tools available in Sentinel Notebooks.

from pyspark.sql.functions import fromjson, col
from pyspark.sql.types import StructType, StructField, StringType

locationschema = StructType([
    StructField("city", StringType(), True),
    StructField("state", StringType(), True),
    StructField("countryOrRegion", StringType(), True)
])

df = df.withColumn("LocationDetails", fromjson(col("LocationDetails"), locationschema))

Step 4 – Preview Results

Finally, select relevant columns and display the latest five entries to complete your work on the Sentinel Notebooks.

signinlocationsdf = df.select(
    "UserPrincipalName",
    "CreatedDateTime",
    "IPAddress",
    "LocationDetails.city",
    "LocationDetails.state",
    "LocationDetails.countryOrRegion"
).orderBy("CreatedDateTime", ascending=False)

signinlocationsdf.show(5)

3. Why This Matters

As a result, this simple workflow demonstrates how analysts can pull raw security data from Sentinel Data Lake using the notebooks. Furthermore, it helps transform nested fields for easier analysis and preview results before scaling to full datasets.

4. Scheduling and Automation

Once your notebook produces consistent results, schedule it to run periodically using Notebooks features. For example, use job scheduling options in VS Code, choose a frequency such as daily or weekly, and save outputs to your lake or Analytics tier.

5. Elevating Insights Back to Sentinel

To make your results actionable, publish findings to the Analytics tier as custom tables. Additionally, trigger alerts or incidents from detected anomalies and share notebooks with other SOC members to standardize analysis.

6. Best Practices

Use sample datasets first before scaling to months of logs to effectively leverage the capabilities of Sentinel Data Lake Notebooks.

Keep notebooks modular with clearly labeled sections.

Leverage version control (GitHub or Azure DevOps) for collaboration.

Validate ML models before deployment to avoid false positives.

Monitor costs when scheduling large Spark jobs.

Conclusion

Jupyter Notebooks and Spark bring the power of scalable analytics to Microsoft Sentinel Data Lake. Consequently, you can query logs, enrich data, and build repeatable workflows without managing infrastructure, as exemplified in Sentinel Data Lake Notebooks.

This marks the next stage of the Unlocking Scalable Security Analytics series:

👉 In 2 weeks when I return from vacation, we’ll explore data governance and retention management in Sentinel Data Lake to keep your analytics cost-efficient and compliant.