Cloudera Primary User Personas
Cloudera has defined the following set of personas described in this topic. These personas are characters based on real people, where each persona represents a user type. This collection of personas helps define the goals and activities of typical users of Cloudera products. Defining personas for software products is a moving target because user types change over time. This collection is the result of a 2018 study collecting data from about fifteen leaders in Cloudera product management and engineering. These primary personas are being validated with some customers to ensure their accuracy and will be updated as needed.
Infrastructure
The personas in this group use either Cloudera Manager or Altus to manage CDH clusters on-premises or in the cloud.
Jim — Senior Hadoop Administrator
Skills and Background
- Very strong knowledge of HDFS and Linux administration
- Understanding of:
- Distributed/grid computing
- VMs and their capabilities
- Racks, disk topologies, and RAID
- Hadoop architecture
- Proficiency in Java
Tools:
Cloudera
- Cloudera Manager/CDH
- Navigator
- BDR
- Workload XM
Third-party Tools: Configuration management tools, log monitoring tools, for example, Splunk, Puppet, Chef, Ganglia, or Grafana
Goals:
- Achieve consistent high availability and performance on Hadoop clusters
- User administration, including creating new users and updating access control rights upon demand
Typical Tasks:
- Monitor cluster performance to ensure high percentage up time
- Back up and replicate appropriate files to ensure disaster recovery
- Schedule and perform cluster upgrades
- Security: enable and check status of security services and configurations
- Analyze query performance with Workload XM to ensure optimum cluster performance
- Provision new clusters
Jen — Junior Hadoop Administrator
Skills and Background
- Basic knowledge of HDFS
- Limited knowledge of Linux (shell scripting mostly)
- General understanding of:
- Distributed/grid computing
- VMs and their capabilities
- Racks, disk topologies, and RAID
- Hadoop architecture
Tools:
Cloudera
- Cloudera Manager/CDH
- Navigator
- Workload XM
Third-party Tools: Configuration management tools, log monitoring tools, for example, Splunk, Puppet, Chef, Ganglia, or Grafana
Goals:
- Maintain high availability and performance of Hadoop clusters
Typical Tasks:
- Perform basic procedures to ensure clusters are up and running
- Perform maintenance work flows
Sarah — Cloud Administrator
Skills and Background
- Understands public cloud primitives (Virtual Private Cloud)
- Understands security access policies (Identity Access Management)
- Proficiency in Java
Tools:
Cloudera
- Altus
Third-party Tools: Amazon Web Services, Microsoft Azure
Goals:
- Maintain correct access to cloud resources
- Maintain correct resource allocation to cloud resources, such as account limits
Typical Tasks:
- Create the Altus environment for the organization
Data Ingest, ETL, and Metadata Management
The personas in this group typically use Navigator, Workload XM, HUE, Hive, Impala, and Spark.
Terence — Enterprise Data Architect or Modeler
Skills and Background
- Experience with:
- ETL process
- Data munging
- Wide variety of data wrangling tools
Tools:
Cloudera
- Navigator
- Workload XM
- HUE
- Hive
- Impala
- Spark
Third-party Tools: ETL and other data wrangling tools
Goals:
- Maintain organized/optimized enterprise data architecture to support the business needs
- Ensure that data models support improved data management and consumption
- Maintain efficient schema design
Typical Tasks:
- Organize data at the macro level: set architectural principles, create data models, create key entity diagrams, and create a data inventory to support business processes and architecture
- Organize data at the micro level: create data models for specific applications
- Map organization use cases to execution engines (Impala, Spark, Hive)
- Provide logical data models for the most important data sets, consuming applications, and data quality rules
- Provide data entity descriptions
- Ingest new data into the system: use ingest tools, monitor ingestion rate, data formatting, and partitioning strategies
Kara — Data Steward and Data Curator
Skills and Background
- Experience with:
- ETL process
- Data wrangling tools
Tools:
Cloudera
- Navigator
- HUE data catalog
Third-party Tools: ETL and other data wrangling tools
Goals:
- Maintain metadata (technical and custom)
- Maintain data policies to support business processes
- Maintain data lifecycle at Hadoop scale
- Maintain data access permissions
Typical Tasks:
- Manage technical metadata
- Classify data at Hadoop scale
- Create and manage custom and business metadata using policies or third-party tools that integrate with Navigator
Analytics and Machine Learning
The personas in this group typically use Cloudera Data Science Workbench (CDSW), HUE, HDFS, and HBase.
Song — Data Scientist
Skills and Background
- Statistics
- Related scripting tools, for example R
- Machine learning models
- SQL
- Basic programming
Tools:
Cloudera
- CDSW
- HUE to build and test queries before adding to CDSW
- HDFS
- HBase
Third-party Tools: R, SAS, SPSS, and others. Command-line scripting languages such as Scala, Python, Tableau, Qlik, and some Java
Goals:
- Solve business problems by applying advanced analytics and machine learning in an ad hoc manner
Typical Tasks:
- Access, explore, and prepare data by joining and cleaning it
- Define data features and variables to solve business problems as in data feature engineering
- Select and adapt machine learning models or write algorithms to answer business questions
- Tune data model features and hyper parameters while running experiments
- Publish the optimized model for wider use as an API for BI Analysts or Data Owners to use as part of their reporting
- Publish data model results to answer business questions for consumption by Data Owners and BI Analysts
Jason — Machine Learning Engineer
Skills and Background
- Machine learning and big data skills
- Software engineering
Tools:
Cloudera
- Spark
- HUE to build and test queries before adding to application
- CDSW
Third-party Tools: Java
Goals:
- Build and maintain production machine learning applications
Typical Tasks:
- Set up big data machine learning projects at companies such as Facebook
Cory — Data Engineer
Skills and Background
- Software engineering
- SQL mastery
- ETL design and big data skills
- Machine learning skills
Tools:
Cloudera
- CDSW
- Spark/MapReduce
- Hive
- Oozie
- Altus Data Engineering
- HUE
- Workload XM
Third-party Tools: IDE, Java, Python, Scala
Goals:
- Create data pipelines (about 40% of working time)
- Maintain data pipelines (about 60% of working time)
Typical Tasks:
- Create data workflow paths
- Create code repository check-ins
- Create XML workflows for production system launches
Sophie — Application Developer
Skills and Background
- Deep knowledge of software engineering to build real-time applications
Tools:
Cloudera
- HBase
Third-party Tools: Various software development tools
Goals:
- Applications developed run and successfully send workloads to the cluster. For example, connects a front-end to HBase on the cluster.
Typical Tasks:
- Develops application features, but does not write the SQL workload. Rather writes the application that sends the workloads to the cluster.
- Tests applications to ensure they run successfully
Abe — SQL Expert/SQL Developer
Skills and Background
- Deep knowledge of SQL dialects and schemas
Tools:
Cloudera
- HUE
- Cloudera Manager to monitor Hive queries
- Hive via command line or HUE
- Impala via HUE, another BI tool, or the command line
- Navigator via HUE
- Sentry via HUE
- Workload XM via HUE
Third-party Tools: SQL Studio, TOAD
Goals:
- Create workloads that perform well and that return the desired results
Typical Tasks:
- Create query workloads that applications send to the cluster
- Ensure optimal performance of query workloads by monitoring the query model and partitioning strategies
- Prepare and test queries before they are added to applications
Kiran — SQL Analyst/SQL User
Skills and Background
- Has high-level grasp of SQL concepts, but prefers to drag and drop query elements
- Good at data visualization, but prefers pre-populated tables and queries
Tools:
Cloudera
- HUE
- Cloudera Manager to monitor queries
- Oozie to schedule workloads
- Impala (rather than Hive)
Third-party Tools: Reporting and business intelligence tools like Cognos, Crystal Reports
Goals:
- To answer business questions and problems based on data
Typical Tasks:
- Create query workloads that applications send to the cluster
- Ensure optimal performance of queries (query model, partitioning strategies)
Christine — BI Analyst
Skills and Background
- Ability to:
- View reports and drill down into results of interest
- Tag, save, share reports and results
Tools:
Cloudera
- HUE
- Navigator via HUE
Third-party Tools: SQL query tools, Tableau, Qlik, Excel
Goals:
- Apply data preparation and analytic skills to solve recurrent business problems. For example, to create a weekly sales report.
- Provide reports for the Business/Data Owner
Typical Tasks:
- Access, explore, and prepare data by joining and cleaning it
- Create reports to satisfy requests from business stakeholders to solve business problems