Contents Introduction xvii Data 2 Storage 3 Computing Power 5 Careers in Analytics 5 The Analytics Process 6 Data Acquisition 7 Cleaning and Manipulation 7 Analysis 7 Visualization 8 Reporting and Communication 9 Analytics Techniques 10 Descriptive Analytics 10 Inferential Analytics 10 Predictive Analytics 10 Prescriptive Analytics 11 Machine Learning, Arti_ cial Intelligence, and Deep Learning 11 Generative AI 12 Robotic Process Automation 13 Data Governance 13 Analytics Tools 14 Summary 16 Chapter 2 Data Analytics Tools 17 Spreadsheets 18 Microsoft Excel 19 Programming Languages 21 R 21 Python 23 Scala 24 SAS 24 Databases and SQL 26 Business Intelligence Software 29 Power BI 29 Tableau 29 Looker 31 Cloud Infrastructure 32 Drivers for Cloud Computing 32 Cloud Service Concepts 33 Cloud Deployment Models 35 Public Cloud Providers 36 Summary 37 Exam Essentials 37 Review Questions 39 Chapter 3 Understanding Data 43 Exploring Data Types 44 Structured Data Types 46 Unstructured Data Types 58 Categories of Data 63 Common Data Structures 66 Structured Data 66 Unstructured Data 68 Semi-structured Data 69 Common File Formats 70 Text Files 70 JavaScript Object Notation 72 Extensible Markup Language (XML) 74 Hypertext Markup Language (HTML) 75 Summary 76 Exam Essentials 77 Review Questions 78 Chapter 4 Databases and Data Acquisition 83 Exploring Databases 84 The Relational Model 85 Relational Databases 88 Nonrelational Databases 94 Database Use Cases 97 Online Transactional Processing 97 Online Analytical Processing 100 Schema Concepts 101 Data Acquisition Concepts 107 Integration 107 Data Sources and Collection Methods 109 Working with Data 120 Data Manipulation 121 Query Optimization 136 Summary 139 Exam Essentials 140 Review Questions 141 Chapter 5 Data Quality 145 Data Inconsistencies 146 Data Duplication 146 Data Redundancy 147 Missing Values 151 Invalid Data 152 Nonparametric Data 153 Data Outliers 153 Speci_ cation Mismatch 154 Data Type Validation 155 Data Completeness 156 Data Transformation Techniques 156 String Manipulation 156 Conversion 158 Augmentation 160 Scaling 160 Grouping Techniques 162 Reduction 163 Aggregation 166 Transposition 167 Exploding 168 Standardization 168 Imputation 171 Parsing 172 Merging 174 Appending 175 Recoding Data 176 Derived Variables 177 Deletion 178 Data Blending 178 Managing Data Quality 180 Circumstances to Check for Quality 180 Automated Validation 182 Data Quality Dimensions 183 Data Quality Rules and Metrics 185 Methods to Validate Quality 188 Summary 190 Exam Essentials 191 Review Questions 192 Chapter 6 Data Analysis and Statistics 197 Communication Approaches 198 Audience 198 Mock-Up 201 Accessibility 201 Statistical Functions and Measures 204 Mathematical 205 Logical 223 Date 226 String 227 Troubleshooting 229 Issues 229 Tools and Methods 232 Analysis Techniques 234 Determine Type of Analysis 235 Types of Analysis 235 Exploratory Data Analysis 236 Summary 237 Exam Essentials 239 Review Questions 240 Chapter 7 Data Visualization with Reports and_Dashboards 245 Exploring Visualization Elements 246 Charts 246 Maps 252 Pivot Tables 255 Infographic 258 Waterfall 259 Word Cloud 263 Understanding Business Requirements 263 Understanding Design Elements 267 Cover Page 268 Executive Summary 269 Branding 269 Documentation Elements 277 Understanding Dashboard Development Methods 279 Consumer Types 279 Data Source Considerations 280 Data Type Considerations 281 Development Process 282 Operational Considerations 282 Delivery Considerations 283 Static and Dynamic Delivery 283 Frequency 284 Data Versioning Techniques 286 Tactical and Research 287 Report Validation Techniques 288 Issues 288 Techniques 289 Reviews 289 Source Validation 290 Data Structures 290 Monitoring Alerts 291 Summary 291 Exam Essentials 293 Review Questions 295 Chapter 8 Data Governance 299 Data Management Concepts 300 Integration 301 Documentation 302 Source of Truth 309 Data Versioning 313 Metadata 313 Data Governance Roles 313 Data Compliance Concepts 315 National Institute of Standards and Technology (NIST) 315 Retention 316 Jurisdictional Requirements 316 Replication 317 Storage 318 Data Ethics 318 Payment Card Industry (PCI) 319 Personal Identi_ able Information (PII) 320 Protected Health Information (PHI) 320 Audit 323 Classi_ cation 323 Incident Reporting 325 Data Privacy and Protection 325 Role-Based Access Control (RBAC) 326 Encryption 327 Masking 332 Anonymization 332 Data Usage 333 Data Sharing 334 Data Quality Assurance Practices 334 International Organization for Standardization (ISO) 335 Source Control 335 Unit Test 336 Requirement Testing 337 Stress Test 337 User Acceptance Testing (UAT) 338 Data Health Check 339 Automated Data Quality Monitoring 339 Data Pro_ ling 341 Summary 342 Exam Essentials 344 Review Questions 345 Index 349.
CompTIA Data+ Study Guide : Exam DA0-002