Apache Zeppelin is a web-based notebook platform that enables interactive data analytics with interactive data visualizations and notebook sharing.
Zeppelin natively supports LDAP/PAM based authentication and user role mapping using Apache Shiro. OAuth integration is not natively available but in latest version KnoxSSO support is added. Using KnoxSSO we can integrate Zeppelin with any OAuth provider.
Apache Knox is an Application Gateway for interacting with the REST APIs and UIs of Apache Hadoop deployments. Knox supports OAuth authentication for hadoop applications using KnoxSSO service. KnoxSSO service is an integration service that provides a normalized SSO token for representing the authenticated user.
Knox user guide is the best resource on how to install and configure it. If you are planning to use Google OAuth you may have to build Knox from source due to the cookie size issue.
Create Client ID and Client secret for web application in any OAuth provider with the following configuration
Create a new topology for KnoxSSO in KNOX_PATH/conf/topologies/knoxsso.xml
For OAuth support Pac4j can be used as federation provider.
Identity assertion provider plays the critical role of communicating the identity principal to be used. In the above configuration emailisconfigured in pac4j.id_attribute which is passed as username to Zeppelin. In some cases we need to transform this attribute to match the format we are using internally. For example we need to transform firstname.lastname@example.com to firstname_lastname.Identity Assertion provider can be used for these transformations. Identify assertion support static principal mapping, concat, switch case and regular expression. To convert email to name format we can use regular expression.
To support role based access in Zeppelin like admin only access for Interpreter settings we need to map the user to groups. Identity assertion provides HadoopGroupProvider which supports multiple methods like unix groups, LDAP lookup etc. If the instance is already configured with Active directory ShellBasedUnixGroupsMapping can be used.
Add the KnoxSSO service to the topology
Create a new topology Knoxssout in KNOX_PATH/conf/topologies/knoxsso.xmland the following topology which is used for logout
We need the public key of the knox server for signature verification. Export the certificate and copy it to conf folder.