Introduction
In Part 1, we explored why Microsoft Sentinel data lake delivers a scalable foundation for modern security analytics. Now, in Part 2, we take the next step by showing how to connect your Syslog collector Virtual Machine (VM), configure a Data Collection Rule (DCR), and validate that data is flowing into your existing custom table.
Because your table is already created from Post 1, this guide focuses on the essential setup steps needed to start ingesting your Syslog events into the data lake quickly and efficiently.
Prerequisites
Before configuring ingestion, ensure you have:
- A Syslog collector VM (Azure or on-premises) with outbound internet access
- The Azure Monitor Agent (AMA) installed
- A Microsoft Sentinel workspace connected to the data lake
- The custom table you created from Post 1. In our instance we used: Syslog_datalake_CL
- Permissions: Log Analytics Contributor or Monitoring Contributor
- All environment setup completed from Post 1
With these components ready, you can begin configuring ingestion.

Step 1 – Connect the Syslog Collector VM
In this walkthrough, Azure Arc was used to deploy the agent to the host. Start by confirming that your VM is properly connected:
- Navigate to Azure Portal → Virtual Machines → [Your VM]
- Select Settings → Extensions
- Verify that AzureMonitorLinuxAgent is installed and healthy
- Open Monitoring → Insights
- Confirm that the VM is sending heartbeat and performance data to the correct workspace
If performance charts appear, the Azure Monitor Agent is operating correctly.
Step 2 – Configure the Data Collection Rule (DCR)
The DCR determines what data is collected and where it is routed. Since your destination table already exists, the only requirement is to map Syslog events to the data lake.
1. Create the DCR rule file
Save the following as rule.json, updating your subscription ID, workspace name, and resource group as needed:
{
"location": "eastus2",
"kind": "Linux",
"properties": {
"dataSources": {
"syslog": [
{
"name": "sysLogsDataSource-1685860449",
"streams": [ "Microsoft-Syslog" ],
"facilityNames": [ "*" ],
"logLevels": [
"Info","Notice","Warning","Error","Critical","Alert","Emergency"
]
}
]
},
"destinations": {
"logAnalytics": [
{
"name": "DataCollectionEvent",
"workspaceResourceId": "/subscriptions/{subscription-id}/resourceGroups/{resource-group}/providers/Microsoft.OperationalInsights/workspaces/{workspace-name}"
}
]
},
"dataFlows": [
{
"streams": [ "Microsoft-Syslog" ],
"destinations": [ "DataCollectionEvent" ],
"transformKql": "source | where SyslogMessage has \"Palo Alto Networks\" | extend Device = Computer, Message = SyslogMessage, Severity = tostring(SeverityLevel) | project TimeGenerated, Device, Message, Severity",
"outputStream": "Custom-Syslog_datalake_CL"
}
]
}
}
2. Deploy the DCR using Azure CLI
az monitor data-collection rule create \
--resource-group security_ops \
--name TEST_DCR \
--rule-file rule.json
3. Assign your Syslog VM to the DCR
You can perform this step through the Azure Portal:
- Search Data Collection Rules
- Open TEST_DCR
- Go to Configuration → Resources
- Drill into your resource group
- Check the box beside your collector VM
- Select Apply
Once assigned, the VM immediately starts forwarding Syslog events to the data lake.
Step 3 – Validate Syslog Data Flow into the Sentinel data lake
After a few minutes, verify ingestion using KQL in the Defender Portal:
Defender → Microsoft Sentinel → Data Lake → Query
Syslog_datalake_CL
| take 20
You should see records containing:
- TenantID
- TimeGenerated
- Device
- Message
- WorkspaceID
If entries appear, your pipeline is active and functioning correctly.

Troubleshooting Tips
If no data appears, review the following:
1. Check the Azure Monitor Agent
- Ensure AMA is running
- Restart the VM
- Restart the AMA extension
- Verify NSG/firewalls allow outbound port 443
2. Validate your DCR configuration
- Confirm the correct workspaceResourceId
- Ensure the stream names and destination names match exactly
3. Verify VM associations
Check the DCR under:
Data Collection Rules → [Your DCR] → Associations
4. Review Azure Activity Logs
Look for:
- Permission issues
- Policy blocks
- Resource creation failures
Most ingestion problems come from small mismatches in DCR names or workspace IDs.
Why Syslog Ingestion Into the Data Lake Matters
Streaming Syslog directly into the Microsoft Sentinel data lake offers significant advantages:
✔ Cost Efficiency
Data lake ingestion and storage cost far less than traditional Log Analytics retention.
✔ Scalability
Handle massive Syslog volumes without workspace retention constraints.
✔ Flexibility
Access the same data through:
- Standard Sentinel KQL
- Data lake exploration
- Spark notebooks
- Machine learning and analytics jobs
This flexibility strengthens long-term threat hunting and analytic capabilities.
Conclusion
With your Syslog collector VM connected, your DCR deployed, and ingestion validated, you now have a functioning pipeline that streams Syslog data directly into the Microsoft Sentinel data lake. This setup forms the foundation for advanced analytics, ML workloads, and scalable detection patterns covered in later posts.
For previous articles in this series, visit:
➡️ Home – Its Security Day with Mike
📚 Reference:
