Python Data Science: Analysis, Wrangling, and Visualization
Data science in Python revolves around a powerful stack: pandas for data manipulation, NumPy for numerical computing, Matplotlib and Seaborn for visualization, and scikit-learn for modeling. Whether you are cleaning messy CSVs, joining datasets, computing statistics, or building dashboards, these tools form the foundation.
This collection covers the full data science workflow from loading and wrangling data through analysis, visualization, and integration with databases and modern tools like Polars and PySpark.
Tutorials marked with the cert badge include a final exam that awards a certificate of completion you can download and share.
Python for Data Science
Overview of the Python data science ecosystem and the role each library plays.
DataFrames in Python
Understanding DataFrame structure, creation, indexing, and basic operations.
Working with pandas DataFrames
Practical pandas operations: filtering, grouping, aggregation, and transformation.
Joining Data Structures with pandas
Merge, join, concat, and combining DataFrames from multiple sources.
NumPy Multidimensional Arrays and Matrices
NumPy array creation, operations, broadcasting, and linear algebra fundamentals.
Python Array Computation Libraries
Comparing NumPy, CuPy, JAX, and other array computation options.
Data Wrangling with Python
Cleaning, reshaping, and preparing real-world data for analysis.
Data Normalization in Python
Min-Max scaling, Z-score standardization, robust scaling, and scikit-learn pipelines.
How to Parse CSV in Python
Reading and writing CSVs with the csv module, pandas, and handling edge cases.
Understanding Pipelines in Python
Building data processing pipelines for reproducible, maintainable analysis workflows.
Seaborn in Python
Statistical visualization with Seaborn: distributions, relationships, categories, and custom styling.
Python statsmodels
Statistical modeling, hypothesis testing, regression analysis, and time series with statsmodels.
Analyzing Financial Data with Python
Working with financial datasets: time series, returns, moving averages, and risk metrics.
Python Financial Data Smoothing
Smoothing techniques for noisy financial data: moving averages, exponential smoothing, and filters.
Using Python in Power BI: The Complete, No-Nonsense Guide
All three integration modes (data source, Power Query transformation, visual), the PythonScriptWrapper mechanics, Service runtime constraints, the May 2026 deprecation, Microsoft Fabric Semantic Link, and four real-world use case walkthroughs. Approximately 2.5 hours. Certificate of completion available.
SQL with Python
Connecting to databases, executing queries, and integrating SQL with pandas workflows.
cursor.execute() in Python Database Programming
Low-level database interaction with cursor objects, parameterized queries, and transaction management.
Python oracledb Guide
Connecting to Oracle databases with the oracledb driver.
Understanding Polars in Python
Polars as a high-performance alternative to pandas with lazy evaluation and parallel execution.
PyArrow: Columnar Engine for Python Data
Apache Arrow's role in the Python data ecosystem for zero-copy data exchange.
PySpark Window Functions
Window functions in PySpark for ranking, running totals, and partitioned computations.
Partition Columns in Python
How Hive-style partitioning works at the filesystem level and how to write and read partitioned Parquet datasets with pandas, PyArrow, PySpark, Polars, and DuckDB.
Python vs R Programming
Comparing Python and R for data science tasks, strengths, and when to use each.