Curriculum — K12STEMLABS AI × Biotechnology

🔬

FOUNDATION TRACK

Biology Meets Data

Weeks 1–6 · High School Level · Perfect for beginners

WEEKS

WK 01–02 Biology Meets Data

BiopythonNumPy

TOPICS

What is bioinformatics and why it matters
DNA, RNA, proteins as structured data
Python setup — NumPy, Pandas, Biopython
Parsing FASTA genome files
GC content and k-mer feature engineering
Building your first genome DataFrame

NCBI GenBank Kaggle Genomics 📄 PDF Guide

SAMPLE CODE

from Bio import SeqIO import pandas as pd records = list(SeqIO.parse("genome.fasta","fasta")) df = pd.DataFrame([{ "id": r.id, "gc": (r.seq.count("G")+r.seq.count("C"))/len(r.seq), "length": len(r.seq) } for r in records]) print(df.describe())

WK 03–04 Machine Learning Essentials

scikit-learnRandom Forest

TOPICS

Supervised vs. unsupervised learning
Decision Trees and Random Forests
Train/test splits and cross-validation
Precision, Recall, F1, AUC-ROC
Classifying organisms from k-mer features

SAMPLE CODE

from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import StratifiedKFold clf = RandomForestClassifier(n_estimators=200) cv = StratifiedKFold(n_splits=5) scores = cross_val_score(clf, X, y, cv=cv) print(f"F1: {scores.mean():.3f}")

WK 05–06 Visualization & Exploratory Data Analysis

SeabornPlotly

Gene expression heatmaps with Seaborn
PCA for dimensionality reduction
GC sliding window plots
Interactive dashboards with Plotly
GTEx tissue expression analysis

from sklearn.decomposition import PCA import matplotlib.pyplot as plt pca = PCA(n_components=2) pcs = pca.fit_transform(scaled_expr) plt.scatter(pcs[:,0], pcs[:,1], c=tissue_labels) plt.xlabel(f"PC1 ({pca.explained_variance_ratio_[0]:.0%})") plt.savefig("pca_tissues.png", dpi=150)

🧬

APPLIED TRACK

Deep Learning Bio

Weeks 7–15 · Intro College Level

WEEKS

WK 07–09 Deep Learning for DNA Sequences

PyTorch CNN

Neural networks from scratch in PyTorch
1D CNNs for DNA motif detection
Transformers and self-attention
ESM-2 protein language model
Fine-tuning pre-trained models

class DNAClassifier(nn.Module): def __init__(self): super().__init__() self.embed = nn.Embedding(5, 64) self.conv = nn.Conv1d(64,128, kernel_size=8) self.pool = nn.AdaptiveMaxPool1d(32) self.head = nn.Linear(128*32, 5) def forward(self, x): x = self.embed(x).permute(0,2,1) return self.head(self.pool( torch.relu(self.conv(x))).flatten(1))

WK 10–12 Structural Biology & AlphaFold AI

py3Dmol ESM-2

Protein structure — primary to quaternary
Multiple Sequence Alignments (MSA)
Coevolution and contact map prediction
AlphaFold2 architecture overview
3D visualization with py3Dmol

import numpy as np def mutual_info(col_i, col_j): # Coevolution between MSA columns ... L = len(msa[0]) mi_mat = np.zeros((L, L)) for i in range(L): for j in range(i+1, L): mi_mat[i,j] = mutual_info( [s[i] for s in msa], [s[j] for s in msa])

WK 13–15 Medical Imaging & Anomaly Detection

MONAI Grad-CAM

DICOM and NIfTI medical image formats
Transfer learning with ResNet/EfficientNet
Autoencoder anomaly detection
Grad-CAM explainability heatmaps
NIH Chest X-ray dataset analysis

def anomaly_score(model, image): model.eval() with torch.no_grad(): recon = model(image.unsqueeze(0)) mse = F.mse_loss(recon.squeeze(), image) return mse.item() # High MSE → likely anomaly / pathology

🚀

CAPSTONE TRACK

AI Drug Discovery

Weeks 16–22 · Advanced · Portfolio-grade output

WEEKS

WK 16–17 Graph Neural Networks in Biology

PyG RDKit

Protein interaction networks as graphs
GNNs with PyTorch Geometric
SMILES to molecular graph conversion
Drug-target binding prediction
BindingDB and ChEMBL datasets

from torch_geometric.nn import GCNConv from torch_geometric.nn import global_mean_pool class MolGNN(nn.Module): def __init__(self): super().__init__() self.conv1 = GCNConv(9, 64) self.conv2 = GCNConv(64, 64) self.head = nn.Linear(64, 1) def forward(self, data): x = F.relu(self.conv1(data.x, data.edge_index)) x = global_mean_pool(x, data.batch) return self.head(x)

WK 18–19 Generative AI for Drug Design

VAE Diffusion

VAEs for molecular generation in latent space
Diffusion models for 3D molecule design
ProteinMPNN inverse folding
SMILES and molecular representations
Ethics in AI drug discovery

class MolVAE(nn.Module): def reparameterize(self, mu, logvar): std = torch.exp(0.5 * logvar) eps = torch.randn_like(std) return mu + eps * std # sample z~N(mu,sigma) # Sample novel molecules: z = torch.randn(16, 256) # batch of 16 latents # → decode → valid SMILES strings

WK 20–22 Capstone — End-to-End ML Pipeline

FastAPI Docker

MLflow experiment tracking and model registry
FastAPI REST endpoint deployment
Docker container packaging
GitHub Actions CI/CD pipeline
Portfolio presentation and README writing

from fastapi import FastAPI app = FastAPI(title="Genome Classifier v2") @app.post("/classify") async def classify(req: SeqRequest): features = extract_kmer(req.sequence, k=4) pred = model.predict([features]) return {"organism": pred[0], "confidence": 0.97}

STUDY MATERIALS

Download PDF Guides

Week 1–2: Biology Meets Data

12 pages · Dark navy theme · Annotated code · Quiz preview

Foundation Download PDF

Week 5–7: Deep Learning for Sequences

14 pages · Neural net diagrams · CNN filter visualization

Applied Download PDF

All 9 Week Guides — Full Bundle

9 PDFs · All tracks covered · Available to enrolled students

Bundle Enroll to Download

Full Curriculum

Download PDF Guides

Ready to Start Learning?