Introduction
In Part 6, we explored what notebooks are and why they matter in Microsoft Sentinel Data Lake. Now it’s time to roll up our sleeves and see notebooks in action.
In this post, we’ll walk through how analysts can use Jupyter Notebooks powered by Spark directly inside the Sentinel Data Lake. Moreover, we will show how to query sign-in logs, parse location details, and preview results without managing clusters or infrastructure. These provide a seamless way to handle data.
📚 Reference: https://learn.microsoft.com/en-us/azure/sentinel/datalake/notebooks
1. The New Integration: Jupyter + Spark
Microsoft Sentinel Data Lake supports running Python notebooks on managed Spark compute through the VS Code Sentinel extension. This integration lets you utilize for tasks such as:
- Use Python libraries such as PySpark and Pandas alongside Sentinel APIs.
- Execute distributed Spark jobs for large-scale analysis.
- Schedule notebooks as recurring jobs without cluster configuration.
2. Getting Started with Your First Spark Notebook
Step 1 – Launch from VS Code
First, install the Microsoft Sentinel extension in Visual Studio Code. Next, connect to your Sentinel Data Lake workspace and create a new notebook using Managed Spark compute. These tools facilitate working effectively with Sentinel Data Lake.
Step 2 – Connect to Data
Begin by using the built-in Sentinel provider to read tables and gain insights using Sentinel Data Lake Notebooks.
from sentinellake.providers import MicrosoftSentinelProvider
dataprovider = MicrosoftSentinelProvider(spark)
workspacename = "CyberSOC" # Replace with your workspace
tablename = "SigninLogs"
df = dataprovider.readtable(tablename, workspacename)
Step 3 – Parse Location Details
Then, convert JSON fields into structured columns for easier analysis with tools available in Sentinel Notebooks.
from pyspark.sql.functions import fromjson, col
from pyspark.sql.types import StructType, StructField, StringType
locationschema = StructType([
StructField("city", StringType(), True),
StructField("state", StringType(), True),
StructField("countryOrRegion", StringType(), True)
])
df = df.withColumn("LocationDetails", fromjson(col("LocationDetails"), locationschema))
Step 4 – Preview Results
Finally, select relevant columns and display the latest five entries to complete your work on the Sentinel Notebooks.
signinlocationsdf = df.select(
"UserPrincipalName",
"CreatedDateTime",
"IPAddress",
"LocationDetails.city",
"LocationDetails.state",
"LocationDetails.countryOrRegion"
).orderBy("CreatedDateTime", ascending=False)
signinlocationsdf.show(5)

3. Why This Matters
As a result, this simple workflow demonstrates how analysts can pull raw security data from Sentinel Data Lake using the notebooks. Furthermore, it helps transform nested fields for easier analysis and preview results before scaling to full datasets.
4. Scheduling and Automation
Once your notebook produces consistent results, schedule it to run periodically using Notebooks features. For example, use job scheduling options in VS Code, choose a frequency such as daily or weekly, and save outputs to your lake or Analytics tier.

5. Elevating Insights Back to Sentinel
To make your results actionable, publish findings to the Analytics tier as custom tables. Additionally, trigger alerts or incidents from detected anomalies and share notebooks with other SOC members to standardize analysis.
6. Best Practices
Use sample datasets first before scaling to months of logs to effectively leverage the capabilities of Sentinel Data Lake Notebooks.
Keep notebooks modular with clearly labeled sections.
Leverage version control (GitHub or Azure DevOps) for collaboration.
Validate ML models before deployment to avoid false positives.
Monitor costs when scheduling large Spark jobs.
Conclusion
Jupyter Notebooks and Spark bring the power of scalable analytics to Microsoft Sentinel Data Lake. Consequently, you can query logs, enrich data, and build repeatable workflows without managing infrastructure, as exemplified in Sentinel Data Lake Notebooks.
This marks the next stage of the Unlocking Scalable Security Analytics series:
- Part 1: Why pair Sentinel with a Data Lake
- Part 2: How integration slashes costs
- Part 3: How to set up Sentinel Data Lake
- Part 4: How to Optimize KQL Queries in Sentinel Data Lake
- Part 5: How to Automate KQL Jobs in Sentinel Data Lake
- Part 6: Explore Sentinel Data Lake Notebooks
👉 In 2 weeks when I return from vacation, we’ll explore data governance and retention management in Sentinel Data Lake to keep your analytics cost-efficient and compliant.
📚 Reference: https://learn.microsoft.com/en-us/azure/sentinel/datalake/notebooks
