Azure Databricks
This guide take you through how you can forward your logs from an Azure Databircks cluster to LOGIQ. Before you proceed with this setup, ensure that you meet the following prerequisites.
  • Private VNI
  • An Azure Databricks cluster in private VNI
  • LOGIQ endpoint
Note: The Databricks cluster must be launched in your own private EMI failing which the default deployment of the Databricks cluster will be fully managed by Azure, the resource group will be locked, and SSH connections to the node will be disabled.
For more information on deploying Azure Databricks in your own private EMI, refer to Deploy Azure Databricks in your Azure virtual network (VNet injection).

Configuring your Databricks cluster to forward logs

To configure your Azure Databricks cluster to forward logs to your LOGIQ endpoint, do the following.
  • Navigate to the Compute section on your Azure portal.
  • Click Create Cluster.
  • Choose your cluster size.
  • Click Advanced options > SSH. Paste your public key under SSH public key. You can generate a public key by running the command ssh-keygen -t rsa -b 4096 -C "email-id”. You will use the private key to log into the machine later on.
  • Next, on the Azure portal, under Network security group, add port 2200 in the Inbound ports section for the machines that the Databricks cluster spun up.

Installing and configuring Fluent Bit

To install and configure Fluent Bit on your Databricks cluster, do the following.
  • Log into the machine using the following command.
1
ssh [email protected] -p 2200 -i <private_key_file_path>
Copied!
  • Install Fluent Bit as per the version of Ubuntu OS running on the machine. For detailed installation instructions, refer to the Fluent Bit documentation.
  • Use the following Fluent Bit configuration file.
1
[SERVICE]
2
Flush 1
3
Parsers_File /etc/td-agent-bit/parsers.conf
4
Log_Level debug
5
6
[INPUT]
7
Name tail
8
Path /dbfs/cluster-logs/*/driver/stdout*
9
Tag driver-stdout
10
Buffer_Max_Size 1MB
11
Ignore_Older 5m
12
13
14
[INPUT]
15
Name tail
16
Path /dbfs/cluster-logs/*/driver/*.log
17
Tag driver-log4j
18
Buffer_Max_Size 1MB
19
Ignore_Older 5m
20
21
22
[INPUT]
23
Name tail
24
Path /dbfs/cluster-logs/*/driver/stderr*
25
Tag driver-stderr
26
Buffer_Max_Size 1MB
27
Ignore_Older 5m
28
29
30
[INPUT]
31
Name tail
32
Path /dbfs/cluster-logs/*/eventlog/*/*/eventlog
33
Tag eventlog
34
Buffer_Max_Size 1MB
35
Ignore_Older 5m
36
37
38
[INPUT]
39
Name tail
40
Path /dbfs/cluster-logs/*/executor/*/*/stdout*
41
Tag executor-stdout
42
Buffer_Max_Size 1MB
43
Ignore_Older 5m
44
45
46
[INPUT]
47
Name tail
48
Path /dbfs/cluster-logs/*/executor/*/*/stderr*
49
Tag executor-stderr
50
Buffer_Max_Size 1MB
51
Ignore_Older 5m
52
53
[FILTER]
54
Name record_modifier
55
Match driver-stdout
56
Record AppName driver-stdout
57
58
[FILTER]
59
Name record_modifier
60
Match eventlog
61
Record AppName eventlog
62
63
[FILTER]
64
Name record_modifier
65
Match driver-stderr
66
Record AppName driver-stderr
67
68
[FILTER]
69
Name record_modifier
70
Match driver-log4j
71
Record AppName driver-log4j
72
73
74
[FILTER]
75
Name record_modifier
76
Match executor-stdout
77
Record AppName executor-stdout
78
79
[FILTER]
80
Name record_modifier
81
Match executor-stderr
82
Record AppName executor-stderr
83
84
[FILTER]
85
Name record_modifier
86
Match *
87
Record cluster_id Linux
88
Record linuxhost ${HOSTNAME}
89
Record namespace Databrick-worker
90
91
[FILTER]
92
Name modify
93
Match *
94
Rename ident AppName
95
Rename procid proc_id
96
Rename pid proc_id
97
98
99
[FILTER]
100
Name parser
101
Match *
102
Key_Name data
103
Parser syslog-rfc3164
104
Reserve_Data On
105
Preserve_Key On
106
107
[OUTPUT]
108
name stdout
109
match *
110
111
[OUTPUT]
112
name http
113
match *
114
host <Logiq endpoint>
115
port 443
116
URI /v1/json_batch
117
Format json
118
tls on
119
tls.verify off
120
net.keepalive off
121
compress gzip
122
Header Authorization Bearer <TOKEN>
Copied!
  • In the Fluent Bit configuration file above, substitute the following details based on your implementation.
    • logiq-endpoint
    • TOKEN
    • Databricks-worker
  • Next, replace the existing configuration at /etc/td-agent-bit/td-agent-bit.conf with the modified file.
  • Finally, restart Fluent Bit by running the following command.
1
systemctl restart td-agent-bit
Copied!
Now, when you log into your LOGIQ UI, you should see the logs from your Azure Databricks cluster being ingested. See the Explore Section to view the logs.