Welcome to the Python tutorial! This section covers the core concepts of Python for data analysis. We'll explain everything in simple terms with easy examples. Scroll down to start coding.
Chapter 1: Python Basics
Python is a popular programming language known for its simple and easy-to-read style. You can use it to give instructions to a computer. To get started, you can use the `print()` function to display a message.
print("Hello, World!")
In Python, you can store information in variables. Think of a variable as a labeled box where you can keep different types of data. Python automatically knows the type of data you store.
- String (`str`): Text, surrounded by quotes. e.g., `"Hello"`
- Integer (`int`): Whole numbers. e.g., `25`
- Float (`float`): Numbers with decimals. e.g., `19.99`
- Boolean (`bool`): Represents `True` or `False`.
A modern way to include variables in your print statements is using f-strings.
# This is a comment. Python ignores it.
name = "Alex"
age = 25
is_student = True # This is a boolean
# Using an f-string to easily combine text and variables
print(f"My name is {name} and I am {age} years old.")
Chapter 2: Python Data Structures
Python has several built-in ways to store collections of data. The most common are Lists, Dictionaries, Tuples, and Sets.
A List is an ordered and changeable collection of items. You create a list by placing items inside square brackets `[]`.
fruits = ["apple", "banana", "cherry"]
# Accessing items (index starts at 0)
print(f"The first fruit is: {fruits[0]}")
# Changing an item
fruits[1] = "blueberry"
# Adding an item
fruits.append("orange")
print(f"Updated list: {fruits}")
A Dictionary is an unordered collection of key-value pairs. You create dictionaries using curly braces `{}`.
person = {"name": "John", "age": 30}
# Accessing a value by its key
print(f"John's age is: {person['age']}")
# Adding a new entry
person["city"] = "New York"
# Looping through a dictionary
for key, value in person.items():
print(f"{key}: {value}")
A Tuple is similar to a list, but it is ordered and unchangeable. Once you create a tuple, you cannot change its items. You use parentheses `()` for tuples.
# A tuple of coordinates
coordinates = (10.0, 20.0)
print(f"Latitude: {coordinates[0]}")
A Set is an unordered and unindexed collection of unique items. It automatically removes any duplicates.
numbers = {1, 2, 3, 3, 4, 2}
print(numbers) # Prints {1, 2, 3, 4}
Chapter 3: Control Flow
Control flow is how you tell your program what to do next. You use `if/elif/else` statements to make decisions. You use loops to repeat actions. A `for` loop is great for iterating over a sequence (like a list), and a `while` loop repeats as long as a condition is true.
Inside loops, you can use `break` to exit the loop immediately, and `continue` to skip the current iteration and move to the next one.
# Example of if/elif/else
score = 85
if score >= 90:
print("Grade: A")
elif score >= 80:
print("Grade: B")
else:
print("Grade: C")
# A 'while' loop with break
count = 0
while True: # This would be an infinite loop without a break
print(f"Count is {count}")
count += 1 # Same as count = count + 1
if count >= 3:
break # Exit the loop
Chapter 4: Functions
A function is a block of reusable code that performs a specific task. This helps you avoid repeating code. You can also provide default values for parameters, making your functions more flexible.
# A function with a default parameter
def greet(name, greeting="Hello"):
return f"{greeting}, {name}!"
# Using the function
print(greet("Alice")) # Uses the default greeting
print(greet("Bob", "Good morning")) # Uses a custom greeting
Chapter 5: List Comprehensions & Lambda Functions
Python offers powerful shortcuts for common tasks. A list comprehension is a shorter, more readable way to create a list, and it can even include conditions.
# Get all even numbers from 0 to 9, and square them
even_squares = [n * n for n in range(10) if n % 2 == 0]
print(even_squares) # Prints [0, 4, 16, 36, 64]
A lambda function is a small, anonymous function. They are very useful when you need a simple function for a short task, like sorting a list of dictionaries by a specific key.
# A list of dictionaries
people = [{"name": "Homer", "age": 39}, {"name": "Bart", "age": 10}]
# Sort the list by age using a lambda function
people.sort(key=lambda person: person["age"])
print(people)
Chapter 6: Object-Oriented Programming (OOP) Basics
OOP is a way of organizing your code around "objects". A class is like a blueprint. The `__init__` method is a special function that runs when you create a new object, used to set up its initial properties (attributes).
class Car:
# The constructor method
def __init__(self, brand, model):
self.brand = brand
self.model = model
self.is_running = False
# A method to start the car
def start_engine(self):
self.is_running = True
print(f"The {self.brand} {self.model}'s engine is running.")
my_car = Car("Toyota", "Camry")
my_car.start_engine()
Chapter 7: Error Handling
Python's `try...except` block lets you handle errors gracefully. You can also add a `finally` block, which will run no matter what, whether an error occurred or not. This is useful for cleanup actions, like closing a file.
try:
file = open("some_file.txt", "r")
content = file.read()
print(content)
except FileNotFoundError:
print("Error: The file was not found.")
finally:
print("Closing the file now (if it was opened).")
Chapter 8: File Handling
Python can read from and write to files. It's common to read a file line by line, especially for large files. The `with` statement ensures the file is closed properly even if errors occur.
# Write multiple lines to a file
with open("shopping_list.txt", "w") as file:
file.write("Milk\n")
file.write("Bread\n")
file.write("Eggs\n")
# Read the file line by line
with open("shopping_list.txt", "r") as file:
for line in file:
print(line.strip()) # .strip() removes extra whitespace
Chapter 9: Working with APIs
An API allows programs to communicate. When you get data from an API using the `requests` library, it's often in JSON format, which looks like a Python dictionary. You can navigate through it to find the data you need.
import requests
# Get data about a specific user from a public API
response = requests.get("https://jsonplaceholder.typicode.com/users/1")
user_data = response.json()
# Access nested data
print(f"Name: {user_data['name']}")
print(f"City: {user_data['address']['city']}")
Chapter 10: Introduction to NumPy
NumPy is essential for numerical computing. You can create arrays in many ways and check their properties like shape and data type.
import numpy as np
# Create an array of numbers from 0 to 9
a = np.arange(10)
print(f"Array: {a}")
# Create a 2x3 array of zeros
b = np.zeros((2, 3))
print(f"\nArray of zeros:\n{b}")
# Check the shape of the array
print(f"\nShape of array 'a': {a.shape}")
Chapter 11: Introduction to Pandas
Pandas is the most popular library for data analysis in Python. A more common way to create a DataFrame is by reading a CSV (Comma-Separated Values) file. You can then use methods like `.info()` and `.describe()` to get a quick summary of your data.
import pandas as pd
# Imagine you have a CSV file named 'sales.csv'
# df = pd.read_csv('sales.csv')
# For this example, we'll use our dictionary DataFrame
data = {'Product': ['A', 'B', 'A', 'C'], 'Sales': [100, 150, 120, 200]}
df = pd.DataFrame(data)
print("DataFrame Info:")
df.info()
print("\nStatistical Summary:")
print(df.describe())
Chapter 12: Data Wrangling with Pandas
Data cleaning is a critical step. With Pandas, you can easily find and remove duplicate rows, or drop entire columns that you don't need for your analysis.
data = {'Name': ['Tom', 'Nick', 'Tom'], 'Age': [20, 21, 20]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# Remove duplicate rows
df_no_duplicates = df.drop_duplicates()
print("\nDataFrame without duplicates:")
print(df_no_duplicates)
# Drop the 'Age' column
df_no_age = df.drop(columns=['Age'])
print("\nDataFrame without the 'Age' column:")
print(df_no_age)
Chapter 13: Advanced Pandas Operations
The `groupby()` method is incredibly powerful. You can group by one or more columns and then calculate multiple summary statistics at once using the `.agg()` method.
data = {'City': ['NY', 'LA', 'NY', 'LA'],
'Sales': [100, 150, 200, 50]}
df = pd.DataFrame(data)
# Group by city and get total and average sales
city_sales = df.groupby('City')['Sales'].agg(['sum', 'mean'])
print(city_sales)
Chapter 14: Data Visualization with Matplotlib
Matplotlib allows for detailed customization of your plots. You can change colors, add markers, and create different types of plots like scatter plots to see relationships between variables.
import matplotlib.pyplot as plt
# Data for a scatter plot
study_hours = [2, 3, 5, 6, 8]
exam_scores = [65, 70, 80, 82, 90]
plt.scatter(study_hours, exam_scores, color='red')
plt.xlabel("Hours Studied")
plt.ylabel("Exam Score")
plt.title("Study Hours vs. Exam Score")
plt.show()
This tutorial will help you build a strong foundation. Happy learning!