About
Welcome to the EH Nexus Data Science Software Skill Building page. As you advance in your career in environmental public health, you might need to improve your skills with the various software packages used in data science. These skills will help you work more efficiently, communicate more effectively, and gain new information to help improve the health of the populations you serve.
Overview
Data science software helps us identify key areas of concern, glean deep insights, and make data-driven decisions. Public health practitioners and other public health partners can build their data science skills through short courses that focus on specific software.
What's included
Below, you will find a series of self-paced learning modules that CDC personnel are using to learn about the tools they need for effective data analytics and visualization.
Each module starts with a core curriculum, which features foundational courses to help you get started with a specific software platform. Once you know the basics, consider exploring advanced pathways, which provide detailed information for different types of users.
The courses are periodically updated to add new content and to reflect software changes.
Software disclaimer
CDC realizes there are a wide variety of software packages and courses available to public health practitioners. Please check with your agency about availability. Some of the software may require purchase or have use or access restrictions. CDC does not officially endorse any product or course listed here.
Power BI
Power BI is a data visualization and business intelligence (BI) tool. It is most used to create interactive dashboards that automatically refresh to give you real-time information. With this tool you can create analytical predictions and shareable reports for different audiences. Using Power BI, practitioners of all skill levels can create their own visuals, publish content to shared workspaces, and collaborate with their colleagues.
The Power BI Core Curriculum is a series of foundational courses, sourced from vendors, intended to help you learn key concepts and prepare you to use Power BI on your platform.
What You Will Learn
- Understand Power BI's interface, value, and flow of work.
- Learn to connect Power BI to different data sources.
- Learn to create dashboards and reports.
- Understand how to publish and share reports and dashboards.
- Learn to create and customize visuals.
The Power BI Analyst Pathway is a series of courses that will help you use Power BI to create reports, visualizations, and analytic models. As a Power BI analyst, you'll learn to prepare and transform data, create complex visualizations, and use Data Analysis Expressions (DAX) to develop functions and calculations.
What You Will Learn
- Learn to create logical and aggregate functions using DAX.
- Learn to leverage the Power Query Editor to extract, transform, and combine data.
- Create custom visuals to improve your Power BI dashboards.
- Enhance visuals with trends, hierarchies, and slicers
The Power BI Consumer Pathway is a series of courses that will help you use Power BI to explore content in Power BI Service, like reports and dashboards, to uncover and share new insights.
What You Will Learn
- Understand Power BI service basic concepts and terminology.
- Find your content in dashboards, reports, and apps.
- View and export data from dashboards and reports.
- Subscribe to a dashboard and report.
- Set a data alert.
The Power BI Developer Pathway is a series of courses that will help you use Power BI to embed analytics, create custom visuals, and stream datasets.
What You Will Learn
- Learn to create customized visuals.
- Learn how to publish and unpublish Power BI content for external partners.
- Learn how to use the REST API to integrate Power BI with external apps and automate management tasks.
These courses explore advanced or specialized Power BI topics once you have mastered the fundamentals.
Posit/RStudio
Posit offers multiple environments and resources to help data scientists work with their data regardless of the language they are using or the size of their team. The resources in this module focus on one of Posit's most popular products, RStudio. Users can leverage Posit's RStudio integrated development environment (IDE) through Posit Cloud, RStudio Desktop, and Posit Workbench on their agency's platform. In addition to sophisticated data analysis capabilities, Posit's RStudio enables users to easily produce high-quality visualizations and interactive web applications to explore, communicate, and share data.
Before getting started with one of the learning pathways, consider reviewing your agency's instructions to gain access to the environments, so you can practice what you are learning.
The Posit/RStudio Core Curriculum is a series of foundational courses to help you learn about the basics of programming and visualizing using RStudio and other Posit products.
What You Will Learn
- Learn how to install a local version of RStudio.
- Tour Posit's RStudio IDE user interface.
- Understand basic R functions and syntax.
- Understand the advantages of R as a programming language.
- Explore and practice using common R packages, including tidyverse.
- Learn how to create a publication-ready visualization.
The Posit/RStudio Analyst Pathway is a series of courses that will help you use RStudio and other Posit products to organize data and build interactive web applications. As a Posit/RStudio analyst, you will learn how to import, explore, analyze, and visualize data using the RStudio IDE.
What You Will Learn
- Learn to import, clean, and wrangle data.
- Learn to visualize, transform, and model data using the tidyverse.
- Use ggplot2 to create sophisticated visualizations.
- Use R to model statistical relationships.
- Use Shiny to create interactive presentations.
- Learn the three main plotting systems and when to use them to get the most out of your data.
The Posit/RStudio Consumer Pathway is a course that will help you use RStudio and other Posit products to understand Posit Shiny web-apps that were created by developers.
What You Will Learn
- Learn how to install Shiny.
- Tour the user interface.
- Visit the Shiny Dev Center to explore the possibilities of using Shiny with Posit.
- View and interact with web applications.
The Posit/RStudio Developer Pathway is a series of courses that will help you use RStudio and other Posit products to develop your own packages, interactive web applications, and web-based visualizations.
What You Will Learn
- Learn the basics of R Markdown and how to share your code.
- Create your own R packages.
- Build interactive web apps and visualizations using Plotly, R, and Shiny,
- Learn to connect to and query databases.
These courses explore advanced or specialized Posit/RStudio topics once you have mastered the fundamentals.
Tableau
Tableau enables a seamless end-to-end analytics experience, from preparing data, to building and sourcing dashboards, to publishing and sharing content. Tableau is ideal for exploring large data sets and transforming them into compelling visualizations and reports.
Tableau is a business intelligence tool that helps teams visualize and understand their data quickly. The tool seamlessly integrates with a range of data sources and servers, which allows users to easily handle large, heterogeneous datasets and identify hidden trends.
Tableau is a great option for users seeking to turn large datasets into compelling visualizations with robust drill-down capabilities. Before jumping into the learning pathways, consider reviewing your agency's access instructions for Tableau Desktop and Server.
The Tableau Core Curriculum is a series of foundational courses to help you learn key concepts and prepare you to visualize data with Tableau.
What You Will Learn
- Understand Tableau's key features.
- Learn how to navigate the Tableau interface.
- Learn how to create simple visualizations and dashboards.
- Learn how to share your work with colleagues.
The Tableau Developer Pathway is a series of courses that will help you leverage Tableau's advanced functionalities, such as calculations and data manipulation.
Some of the courses include datasets for download, so consider downloading them to follow along in your own environment.
What You Will Learn
- Understand the syntax for calculations.
- Leverage the calculations feature in Tableau.
- Learn to use parameters to create interactive visualizations.
- Create and use parameters for audience interactivity with your work.
- Embed using the JavaScript API.
- Add extensions to a dashboard.
- Develop dashboard extensions.
- Leverage the Tableau Server REST API to manage your resources.
These courses explore advanced or specialized topics once you have mastered the fundamentals.
Azure Data Factory
Azure Data Factory is a cloud-based extract, transform, load, and data integration service that allows users to move data from one source to another. This application can help organize multiple activities into a single pipeline.
Azure Data Factory is a cloud-based extract, transform, load, and data integration service that allows you to create data-driven workflows, called data pipelines.
With this tool, you can orchestrate multiple activities, such as moving and transforming data, within a single data pipeline. You can also manage, deploy, and schedule activities to run at specific intervals. This can help you complete complex, multi-step processes at once—significantly reducing the time it takes for you to complete your overall workload.
Currently, we only have suggested courses for the Azure Data Factory Core Curriculum pathway. Check back later for more role-based pathways.
The Azure Data Factory Core Curriculum is a collection of foundational courses paired with hands-on activities. You will complete activities in the Azure Data Factory sandbox (training environment). Together, these materials will help you learn key concepts and prepare you to use Azure Data Factory.
What You Will Learn
- Understand Azure Data Factory's components and value.
- Explain the data factory process.
- Describe data integration patterns.
- Use linked services and datasets.
- Describe data ingestion methods and security considerations.
- Understand and manage different types of integration runtimes.
- Create data factory activities and pipelines in your agency's Azure Data Factory sandbox.
- Extract data from its original source.
- Transform data by unzipping it and ensuring appropriate file paths.
- Load data into the target database.
Microsoft Synapse Analytics
Synapse is an enterprise analytics service that improves the data analytics processes across an organization.
Synapse is an evolving Microsoft platform that offers enterprise data warehousing, data integration, and big data analytics. Your agency may offer a limited version of Synapse that has dedicated and serverless SQL pools.
With the dedicated SQL pool, you can store data in relational tables with columnar storage, create views and tables, and run fast queries on stored datasets.
With the serverless SQL pool, users can quickly build views by querying and transforming large datasets stored in your data lake.
Currently, we only have suggested courses for a Synapse Core Curriculum pathway. Check back later for more role-based pathways.
The Synapse Core Curriculum is a set of overview courses. The training environment includes hands-on activities that you can complete in a dedicated and serverless SQL pool sandbox. These materials will help you learn key concepts and prepare you to use Synapse on your agency's platform.
What You Will Learn
- Summarize how Synapse's features and functionalities can help speed up data workflows.
- Compare the dedicated and serverless SQL pools to determine when to use each.
- Explain how views and tables can be used for your data operations.
- Execute basic T-SQL commands.
SAS Viya
SAS Viya is a web-based analytics cloud environment for storing, transforming, visualizing, and publishing data/datasets. The SAS Viya ecosystem can be used to for data management, data preparation, data analytics, model building, and visualizations to aid with various data science projects.
SAS Viya is a web-based analytics cloud environment for storing, transforming, visualizing, and publishing data/datasets. SAS Viya and SAS 9.4 offer end-to-end analytics, but they're not designed for the same users, environments, or futures.
SAS Viya provides quick, accurate, and reliable analytical insights in an environment that provides multi-threaded, parallel, distributed processing. SAS Viya also provides scalability, elasticity, and support for open-source language integration (e.g., Python, R).
Note: Your agency will need to purchase and provide a license for your SAS Viya sandbox environment. The courses are best used with access to a sandbox environment.
Currently, we only have suggested courses for a core curriculum pathway. Check back later for more role-based pathways.
The SAS Viya Core Curriculum is a collection of foundational courses sourced from the SAS Institute. These courses are intended to help you learn key concepts and identifies the differences between SAS 9.4 and SAS Viya.
Note: To get the most out of the training, you will need to access your agency's SAS Viya sandbox/training environment.
What You Will Learn
- Learn how to access the SAS Viya Virtual Learning Environment and navigate the basic interface.
- Distinguish among the three SAS Viya licenses (SAS Visual Statistics, SAS Visual Analytics, and SAS Analytics Pro) and select the license most suitable for the use case.
Databricks
Databricks is a cloud-based software system built on Apache Spark. It is used to process and transform large quantities of data in a shared workspace. By managing automated clusters and writing code in Python-style notebooks, you can run interactive and scheduled data analysis workloads quickly and collaboratively.