明知山没虎

一个游手好闲的人

一篇文章搞定Docker:从入门到服务器部署

2025-12-24

前言

作为一个折腾了七八年Docker的老司机,我见过太多人被Docker的概念绕晕,也见过无数项目因为环境问题搞得焦头烂额。今天就用最直白的方式,带你彻底搞懂Docker,从基础概念到服务器部署,一篇文章全搞定。

Docker的核心价值就四个字:环境一致性。你在本地能跑的,打包成Docker镜像后,在任何地方都能跑。就这么简单。

Docker是什么鬼?

用人话解释Docker

把Docker想象成一个超轻量级的虚拟机,但它比虚拟机聪明多了:

  • 虚拟机:整个操作系统都要虚拟化,笨重得要死

  • Docker:只虚拟化应用层,共享宿主机内核,轻快如飞

形象比喻

  • 虚拟机就像买了整套房子,每个房子都有独立的水电气

  • Docker就像住酒店,共享基础设施,但房间完全隔离

核心概念(5分钟搞懂)

概念

解释

类比

镜像(Image)

只读的模板,包含运行应用所需的一切

光盘ISO文件

容器(Container)

镜像运行时的实例

从光盘安装后运行的程序

Dockerfile

构建镜像的脚本

安装程序的步骤说明书

仓库(Registry)

存储镜像的地方

应用商店/软件下载站

安装Docker(Ubuntu为例)

方法一:官方脚本(推荐)

# 一键安装脚本
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh

# 将当前用户加入docker组(避免每次都sudo)
sudo usermod -aG docker $USER

# 重新登录或者执行
newgrp docker

# 设置开机自启
sudo systemctl enable docker
sudo systemctl start docker

方法二:手动安装

# 更新包索引
sudo apt update

# 安装依赖
sudo apt install apt-transport-https ca-certificates curl gnupg lsb-release

# 添加Docker官方GPG密钥
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

# 添加Docker仓库
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# 安装Docker
sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io

# 验证安装
docker --version
docker run hello-world

安装Docker Compose

# 下载Docker Compose
sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

# 添加执行权限
sudo chmod +x /usr/local/bin/docker-compose

# 验证安装
docker-compose --version

Docker基础命令实战

镜像操作

# 搜索镜像
docker search nginx

# 拉取镜像
docker pull nginx:latest

# 查看本地镜像
docker images

# 删除镜像
docker rmi nginx:latest

# 构建镜像
docker build -t my-app:v1.0 .

# 给镜像打标签
docker tag my-app:v1.0 my-app:latest

# 推送镜像到仓库
docker push my-app:v1.0

容器操作

# 运行容器
docker run -d --name web-server -p 80:80 nginx

# 查看运行中的容器
docker ps

# 查看所有容器(包括已停止的)
docker ps -a

# 停止容器
docker stop web-server

# 启动容器
docker start web-server

# 重启容器
docker restart web-server

# 删除容器
docker rm web-server

# 进入容器内部
docker exec -it web-server /bin/bash

# 查看容器日志
docker logs web-server
docker logs -f web-server  # 实时查看

# 查看容器详细信息
docker inspect web-server

# 查看容器资源使用情况
docker stats web-server

清理命令

# 删除所有停止的容器
docker container prune

# 删除所有未使用的镜像
docker image prune

# 删除所有未使用的网络
docker network prune

# 删除所有未使用的卷
docker volume prune

# 一键清理所有未使用的资源
docker system prune -a

Dockerfile实战教程

基础语法

# 基础镜像
FROM ubuntu:20.04

# 维护者信息
LABEL maintainer="your-email@example.com"

# 设置环境变量
ENV NODE_VERSION=18.0.0
ENV APP_ENV=production

# 设置工作目录
WORKDIR /app

# 复制文件
COPY package*.json ./
COPY src/ ./src/

# 运行命令
RUN apt-get update && \
    apt-get install -y curl && \
    curl -fsSL https://deb.nodesource.com/setup_18.x | bash - && \
    apt-get install -y nodejs

# 安装依赖
RUN npm ci --only=production

# 暴露端口
EXPOSE 3000

# 设置启动用户
USER node

# 健康检查
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1

# 启动命令
CMD ["npm", "start"]

实战案例1:Node.js应用

# 项目结构
# my-node-app/
# ├── Dockerfile
# ├── package.json
# ├── src/
# │   └── app.js
# └── .dockerignore

# Dockerfile
FROM node:18-alpine

# 设置工作目录
WORKDIR /usr/src/app

# 复制package文件
COPY package*.json ./

# 安装依赖
RUN npm ci --only=production && npm cache clean --force

# 创建非root用户
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nodejs -u 1001

# 复制源代码
COPY --chown=nodejs:nodejs . .

# 切换到非root用户
USER nodejs

# 暴露端口
EXPOSE 3000

# 健康检查
HEALTHCHECK --interval=10s --timeout=3s --start-period=5s --retries=3 \
  CMD node healthcheck.js

# 启动命令
CMD ["node", "src/app.js"]
# .dockerignore
node_modules
npm-debug.log
.git
.gitignore
README.md
.env
coverage
.nyc_output

实战案例2:Python Flask应用

FROM python:3.9-slim

# 设置环境变量
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1

# 设置工作目录
WORKDIR /app

# 安装系统依赖
RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        postgresql-client \
    && rm -rf /var/lib/apt/lists/*

# 复制requirements文件
COPY requirements.txt .

# 安装Python依赖
RUN pip install --no-cache-dir -r requirements.txt

# 复制项目文件
COPY . .

# 创建非root用户
RUN adduser --disabled-password --gecos '' appuser
RUN chown -R appuser:appuser /app
USER appuser

# 暴露端口
EXPOSE 5000

# 健康检查
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:5000/health || exit 1

# 启动命令
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "app:app"]

实战案例3:静态网站(Nginx)

# 多阶段构建示例
FROM node:18-alpine as builder

WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# 生产阶段
FROM nginx:alpine

# 复制构建产物
COPY --from=builder /app/dist /usr/share/nginx/html

# 复制自定义nginx配置
COPY nginx.conf /etc/nginx/nginx.conf

# 暴露端口
EXPOSE 80

# 启动nginx
CMD ["nginx", "-g", "daemon off;"]

Dockerfile最佳实践

# 1. 使用官方镜像作为基础镜像
FROM node:18-alpine

# 2. 合并RUN指令减少层数
RUN apk add --no-cache \
    git \
    curl \
    && rm -rf /var/cache/apk/*

# 3. 利用构建缓存,先复制依赖文件
COPY package*.json ./
RUN npm ci --only=production

# 4. 最后复制源代码
COPY . .

# 5. 使用非root用户
USER node

# 6. 使用COPY而不是ADD
COPY src/ ./src/

# 7. 设置合适的WORKDIR
WORKDIR /usr/src/app

# 8. 清理不必要文件
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        ca-certificates && \
    rm -rf /var/lib/apt/lists/*

Docker Compose实战

基础概念

Docker Compose是用来定义和运行多容器Docker应用的工具。简单说就是用YAML文件管理多个容器

实战案例1:WordPress + MySQL

# docker-compose.yml
version: '3.8'

services:
  # MySQL数据库
  db:
    image: mysql:8.0
    container_name: wordpress_db
    restart: unless-stopped
    environment:
      MYSQL_ROOT_PASSWORD: rootpassword
      MYSQL_DATABASE: wordpress
      MYSQL_USER: wordpress
      MYSQL_PASSWORD: wordpress
    volumes:
      - mysql_data:/var/lib/mysql
    networks:
      - wordpress_network

  # WordPress应用
  wordpress:
    image: wordpress:latest
    container_name: wordpress_app
    restart: unless-stopped
    ports:
      - "8080:80"
    environment:
      WORDPRESS_DB_HOST: db:3306
      WORDPRESS_DB_USER: wordpress
      WORDPRESS_DB_PASSWORD: wordpress
      WORDPRESS_DB_NAME: wordpress
    volumes:
      - wordpress_data:/var/www/html
    depends_on:
      - db
    networks:
      - wordpress_network

  # Nginx反向代理
  nginx:
    image: nginx:alpine
    container_name: wordpress_nginx
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf
      - ./nginx/ssl:/etc/nginx/ssl
      - wordpress_data:/var/www/html
    depends_on:
      - wordpress
    networks:
      - wordpress_network

volumes:
  mysql_data:
  wordpress_data:

networks:
  wordpress_network:
    driver: bridge

实战案例2:全栈应用(前端+后端+数据库+Redis)

version: '3.8'

services:
  # PostgreSQL数据库
  postgres:
    image: postgres:14-alpine
    container_name: app_postgres
    restart: unless-stopped
    environment:
      POSTGRES_DB: myapp
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: password
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql
    ports:
      - "5432:5432"
    networks:
      - backend

  # Redis缓存
  redis:
    image: redis:7-alpine
    container_name: app_redis
    restart: unless-stopped
    command: redis-server --appendonly yes
    volumes:
      - redis_data:/data
    ports:
      - "6379:6379"
    networks:
      - backend

  # 后端API
  backend:
    build:
      context: ./backend
      dockerfile: Dockerfile
    container_name: app_backend
    restart: unless-stopped
    environment:
      NODE_ENV: production
      DATABASE_URL: postgresql://postgres:password@postgres:5432/myapp
      REDIS_URL: redis://redis:6379
      JWT_SECRET: your-secret-key
    ports:
      - "3000:3000"
    depends_on:
      - postgres
      - redis
    volumes:
      - ./backend/uploads:/app/uploads
    networks:
      - backend
      - frontend
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  # 前端应用
  frontend:
    build:
      context: ./frontend
      dockerfile: Dockerfile
    container_name: app_frontend
    restart: unless-stopped
    ports:
      - "80:80"
    depends_on:
      - backend
    networks:
      - frontend

  # 监控服务
  prometheus:
    image: prom/prometheus:latest
    container_name: app_prometheus
    restart: unless-stopped
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
    networks:
      - monitoring

  grafana:
    image: grafana/grafana:latest
    container_name: app_grafana
    restart: unless-stopped
    ports:
      - "3001:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: admin
    volumes:
      - grafana_data:/var/lib/grafana
    networks:
      - monitoring

volumes:
  postgres_data:
  redis_data:
  prometheus_data:
  grafana_data:

networks:
  backend:
    driver: bridge
  frontend:
    driver: bridge
  monitoring:
    driver: bridge

Compose常用命令

# 启动所有服务
docker-compose up -d

# 查看服务状态
docker-compose ps

# 查看服务日志
docker-compose logs -f backend

# 停止所有服务
docker-compose down

# 停止并删除volumes
docker-compose down -v

# 重新构建服务
docker-compose build backend

# 重启服务
docker-compose restart backend

# 扩展服务(运行多个实例)
docker-compose up -d --scale backend=3

# 执行命令
docker-compose exec backend bash

# 查看配置
docker-compose config

服务器部署实战

准备服务器环境

# 连接到服务器
ssh root@your-server-ip

# 更新系统
apt update && apt upgrade -y

# 安装必要工具
apt install -y curl wget git vim ufw

# 配置防火墙
ufw allow ssh
ufw allow http
ufw allow https
ufw enable

# 安装Docker(使用前面的安装方法)
curl -fsSL https://get.docker.com -o get-docker.sh
sh get-docker.sh

# 安装Docker Compose
curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose

部署方式1:直接部署

# 创建项目目录
mkdir -p /opt/myapp
cd /opt/myapp

# 克隆项目(或上传文件)
git clone https://github.com/your-username/your-project.git .

# 创建环境变量文件
cat > .env << EOF
NODE_ENV=production
DATABASE_URL=postgresql://postgres:your-password@postgres:5432/myapp
REDIS_URL=redis://redis:6379
JWT_SECRET=your-super-secret-key
DOMAIN=yourdomain.com
EOF

# 启动服务
docker-compose up -d

# 查看状态
docker-compose ps
docker-compose logs -f

部署方式2:使用CI/CD自动部署

# GitHub Actions配置
# .github/workflows/deploy.yml
name: Deploy to Server

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    
    steps:
    - uses: actions/checkout@v3
    
    - name: Build and push Docker image
      run: |
        echo ${{ secrets.DOCKER_PASSWORD }} | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin
        docker build -t myapp:${{ github.sha }} .
        docker tag myapp:${{ github.sha }} myapp:latest
        docker push myapp:${{ github.sha }}
        docker push myapp:latest
    
    - name: Deploy to server
      uses: appleboy/ssh-action@v0.1.5
      with:
        host: ${{ secrets.HOST }}
        username: ${{ secrets.USERNAME }}
        key: ${{ secrets.SSH_KEY }}
        script: |
          cd /opt/myapp
          docker-compose pull
          docker-compose up -d
          docker image prune -f

部署方式3:使用Docker Swarm集群

# 初始化Swarm集群
docker swarm init

# 创建overlay网络
docker network create -d overlay app-network

# 部署stack
docker stack deploy -c docker-compose.yml myapp

# 查看stack状态
docker stack ls
docker stack services myapp

# 扩展服务
docker service scale myapp_backend=3

# 更新服务
docker service update --image myapp:v2.0 myapp_backend

配置反向代理(Nginx)

# /etc/nginx/sites-available/myapp
server {
    listen 80;
    server_name yourdomain.com www.yourdomain.com;
    
    # 重定向到HTTPS
    return 301 https://$server_name$request_uri;
}

server {
    listen 443 ssl http2;
    server_name yourdomain.com www.yourdomain.com;
    
    # SSL证书配置
    ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;
    
    # SSL配置优化
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512;
    ssl_prefer_server_ciphers off;
    ssl_session_cache shared:SSL:10m;
    
    # 安全头
    add_header Strict-Transport-Security "max-age=63072000" always;
    add_header X-Frame-Options DENY;
    add_header X-Content-Type-Options nosniff;
    add_header X-XSS-Protection "1; mode=block";
    
    # 反向代理到Docker容器
    location / {
        proxy_pass http://localhost:3000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_cache_bypass $http_upgrade;
    }
    
    # 静态文件缓存
    location ~* \.(jpg|jpeg|png|gif|ico|css|js)$ {
        proxy_pass http://localhost:3000;
        expires 1y;
        add_header Cache-Control "public, immutable";
    }
}
# 启用站点
ln -s /etc/nginx/sites-available/myapp /etc/nginx/sites-enabled/
nginx -t
systemctl reload nginx

# 获取SSL证书
apt install certbot python3-certbot-nginx
certbot --nginx -d yourdomain.com -d www.yourdomain.com

监控和日志

监控配置

# monitoring/docker-compose.yml
version: '3.8'

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.enable-lifecycle'

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: unless-stopped
    ports:
      - "3001:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: admin123
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/dashboards:/etc/grafana/provisioning/dashboards
      - ./grafana/datasources:/etc/grafana/provisioning/datasources

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    restart: unless-stopped
    ports:
      - "9100:9100"
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    restart: unless-stopped
    ports:
      - "8080:8080"
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro

volumes:
  prometheus_data:
  grafana_data:

日志管理

# 集中化日志配置
version: '3.8'

services:
  app:
    image: myapp:latest
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
    
  # 使用ELK Stack
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.15.0
    environment:
      - discovery.type=single-node
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    volumes:
      - elasticsearch_data:/usr/share/elasticsearch/data

  logstash:
    image: docker.elastic.co/logstash/logstash:7.15.0
    volumes:
      - ./logstash/pipeline:/usr/share/logstash/pipeline
    depends_on:
      - elasticsearch

  kibana:
    image: docker.elastic.co/kibana/kibana:7.15.0
    ports:
      - "5601:5601"
    environment:
      ELASTICSEARCH_HOSTS: http://elasticsearch:9200
    depends_on:
      - elasticsearch

volumes:
  elasticsearch_data:

安全最佳实践

容器安全

# 安全的Dockerfile示例
FROM node:18-alpine

# 更新包管理器
RUN apk update && apk upgrade

# 创建非root用户
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nodejs -u 1001

# 设置工作目录
WORKDIR /app

# 复制package文件
COPY package*.json ./

# 安装依赖
RUN npm ci --only=production && \
    npm cache clean --force

# 复制应用代码
COPY --chown=nodejs:nodejs . .

# 移除不必要的包
RUN apk del apk-tools

# 切换到非root用户
USER nodejs

# 使用只读根文件系统
# docker run --read-only --tmpfs /tmp myapp

EXPOSE 3000
CMD ["node", "index.js"]

网络安全

# 网络隔离配置
version: '3.8'

services:
  frontend:
    image: myapp-frontend
    networks:
      - frontend-network
    ports:
      - "80:80"

  backend:
    image: myapp-backend
    networks:
      - frontend-network
      - backend-network
    # 不暴露端口到宿主机

  database:
    image: postgres:14
    networks:
      - backend-network
    # 完全隔离,只有backend可以访问

networks:
  frontend-network:
  backend-network:
    internal: true  # 内部网络,无法访问外网

环境变量和密钥管理

# 使用Docker Secrets(Swarm模式)
echo "mypassword" | docker secret create db_password -

# 在compose文件中使用
version: '3.8'
services:
  app:
    image: myapp
    secrets:
      - db_password
    environment:
      DB_PASSWORD_FILE: /run/secrets/db_password

secrets:
  db_password:
    external: true
# 使用环境变量文件
cat > .env << EOF
# 数据库配置
DB_HOST=postgres
DB_USER=myuser
DB_PASSWORD=strongpassword
DB_NAME=myapp

# 应用配置
JWT_SECRET=your-jwt-secret-key
API_KEY=your-api-key

# 邮件配置
SMTP_HOST=smtp.gmail.com
SMTP_USER=your-email@gmail.com
SMTP_PASS=your-app-password
EOF

# 设置文件权限
chmod 600 .env

性能优化

镜像优化

# 多阶段构建优化
# 构建阶段
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# 生产阶段
FROM node:18-alpine AS production
WORKDIR /app

# 只复制必要文件
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY package*.json ./

# 清理缓存
RUN npm cache clean --force

USER node
EXPOSE 3000
CMD ["node", "dist/index.js"]

资源限制

version: '3.8'

services:
  app:
    image: myapp
    deploy:
      resources:
        limits:
          cpus: '0.5'
          memory: 512M
        reservations:
          cpus: '0.25'
          memory: 256M
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3

健康检查

services:
  app:
    image: myapp
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

故障排查

常用调试命令

# 查看容器详细信息
docker inspect container_name

# 查看容器内进程
docker exec container_name ps aux

# 查看容器网络
docker network ls
docker network inspect network_name

# 查看容器卷
docker volume ls
docker volume inspect volume_name

# 查看容器资源使用
docker stats

# 导出容器文件系统
docker export container_name > container.tar

# 查看镜像构建历史
docker history image_name

# 查看容器文件系统变化
docker diff container_name

日志调试

# 查看容器日志
docker logs -f --tail 100 container_name

# 查看特定时间段的日志
docker logs --since "2023-01-01T00:00:00" --until "2023-01-02T00:00:00" container_name

# 查看compose服务日志
docker-compose logs -f service_name

# 实时监控所有容器日志
docker-compose logs -f

# 查看系统事件
docker events

# 查看Docker守护进程日志
journalctl -u docker.service -f

常见问题解决

问题1:容器启动失败

# 检查容器状态
docker ps -a

# 查看启动日志
docker logs container_name

# 进入容器调试(如果容器还在运行)
docker exec -it container_name /bin/sh

# 以调试模式运行
docker run -it --entrypoint=/bin/sh image_name

问题2:网络连接问题

# 检查网络配置
docker network ls
docker network inspect bridge

# 测试容器间连通性
docker exec container1 ping container2

# 检查端口映射
docker port container_name

# 检查防火墙规则
iptables -L
ufw status

问题3:存储空间不足

# 查看Docker磁盘使用
docker system df

# 清理未使用资源
docker system prune -a

# 清理特定类型资源
docker container prune
docker image prune -a
docker volume prune
docker network prune

# 清理build缓存
docker builder prune

问题4:性能问题

# 查看资源使用情况
docker stats

# 查看容器内存使用详情
docker exec container_name cat /proc/meminfo

# 查看容器进程
docker exec container_name top

# 限制容器资源
docker run -m 512m --cpus="1.5" image_name

生产环境部署方案

高可用部署架构

# 高可用部署配置
version: '3.8'

services:
  # 负载均衡器
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf
      - ./nginx/ssl:/etc/ssl
      - ./nginx/logs:/var/log/nginx
    depends_on:
      - app
    restart: unless-stopped
    deploy:
      replicas: 2
      placement:
        constraints:
          - node.role == manager

  # 应用服务(多副本)
  app:
    image: myapp:latest
    environment:
      - NODE_ENV=production
      - DATABASE_URL=${DATABASE_URL}
      - REDIS_URL=${REDIS_URL}
    volumes:
      - app_uploads:/app/uploads
    depends_on:
      - postgres
      - redis
    restart: unless-stopped
    deploy:
      replicas: 3
      update_config:
        parallelism: 1
        delay: 10s
        order: start-first
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3
      resources:
        limits:
          memory: 512M
          cpus: '0.5'

  # 数据库主从复制
  postgres-master:
    image: postgres:14
    environment:
      POSTGRES_DB: myapp
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_REPLICATION_USER: replicator
      POSTGRES_REPLICATION_PASSWORD: ${REPLICATION_PASSWORD}
    volumes:
      - postgres_master_data:/var/lib/postgresql/data
      - ./postgres/master.conf:/etc/postgresql/postgresql.conf
    command: postgres -c config_file=/etc/postgresql/postgresql.conf
    deploy:
      placement:
        constraints:
          - node.labels.postgres-master == true

  postgres-slave:
    image: postgres:14
    environment:
      POSTGRES_MASTER_SERVICE: postgres-master
      POSTGRES_REPLICATION_USER: replicator
      POSTGRES_REPLICATION_PASSWORD: ${REPLICATION_PASSWORD}
    volumes:
      - postgres_slave_data:/var/lib/postgresql/data
    deploy:
      replicas: 2
      placement:
        constraints:
          - node.labels.postgres-slave == true

  # Redis集群
  redis-master:
    image: redis:7-alpine
    command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD}
    volumes:
      - redis_master_data:/data

  redis-slave:
    image: redis:7-alpine
    command: redis-server --slaveof redis-master 6379 --requirepass ${REDIS_PASSWORD}
    volumes:
      - redis_slave_data:/data
    deploy:
      replicas: 2

volumes:
  postgres_master_data:
  postgres_slave_data:
  redis_master_data:
  redis_slave_data:
  app_uploads:

networks:
  default:
    driver: overlay
    attachable: true

零停机部署策略

#!/bin/bash
# 零停机部署脚本 deploy.sh

set -e

APP_NAME="myapp"
IMAGE_NAME="myapp:latest"
CONTAINER_NAME="${APP_NAME}_blue"
BACKUP_CONTAINER="${APP_NAME}_green"

echo "开始零停机部署..."

# 1. 拉取最新镜像
echo "拉取最新镜像..."
docker pull $IMAGE_NAME

# 2. 启动新容器(蓝绿部署)
echo "启动新容器..."
docker run -d \
  --name $BACKUP_CONTAINER \
  --network myapp_network \
  -e NODE_ENV=production \
  -e DATABASE_URL=${DATABASE_URL} \
  -v myapp_uploads:/app/uploads \
  $IMAGE_NAME

# 3. 等待新容器健康检查通过
echo "等待健康检查..."
for i in {1..30}; do
  if docker exec $BACKUP_CONTAINER curl -f http://localhost:3000/health; then
    echo "健康检查通过"
    break
  fi
  if [ $i -eq 30 ]; then
    echo "健康检查失败,回滚"
    docker rm -f $BACKUP_CONTAINER
    exit 1
  fi
  sleep 2
done

# 4. 更新负载均衡器配置
echo "更新负载均衡器..."
./update_nginx_upstream.sh $BACKUP_CONTAINER

# 5. 停止旧容器
echo "停止旧容器..."
if docker ps | grep -q $CONTAINER_NAME; then
  docker stop $CONTAINER_NAME
  docker rm $CONTAINER_NAME
fi

# 6. 重命名新容器
docker rename $BACKUP_CONTAINER $CONTAINER_NAME

echo "部署完成!"

备份和恢复策略

#!/bin/bash
# 自动备份脚本 backup.sh

BACKUP_DIR="/opt/backups"
DATE=$(date +%Y%m%d_%H%M%S)

# 创建备份目录
mkdir -p $BACKUP_DIR

# 1. 数据库备份
echo "备份数据库..."
docker exec postgres_master pg_dump -U postgres myapp | gzip > "$BACKUP_DIR/postgres_$DATE.sql.gz"

# 2. Redis备份
echo "备份Redis..."
docker exec redis_master redis-cli BGSAVE
docker cp redis_master:/data/dump.rdb "$BACKUP_DIR/redis_$DATE.rdb"

# 3. 应用数据备份
echo "备份应用数据..."
docker run --rm \
  -v myapp_uploads:/data \
  -v $BACKUP_DIR:/backup \
  alpine tar czf /backup/uploads_$DATE.tar.gz -C /data .

# 4. 配置文件备份
echo "备份配置文件..."
tar czf "$BACKUP_DIR/config_$DATE.tar.gz" /opt/myapp

# 5. 清理旧备份(保留7天)
find $BACKUP_DIR -name "*.gz" -mtime +7 -delete
find $BACKUP_DIR -name "*.rdb" -mtime +7 -delete

# 6. 上传到云存储(可选)
# aws s3 sync $BACKUP_DIR s3://my-backup-bucket/

echo "备份完成: $DATE"
#!/bin/bash
# 恢复脚本 restore.sh

BACKUP_FILE=$1
BACKUP_DIR="/opt/backups"

if [ -z "$BACKUP_FILE" ]; then
  echo "用法: ./restore.sh <backup_date>"
  echo "可用备份:"
  ls -la $BACKUP_DIR
  exit 1
fi

# 1. 停止所有服务
echo "停止服务..."
docker-compose down

# 2. 恢复数据库
echo "恢复数据库..."
gunzip -c "$BACKUP_DIR/postgres_$BACKUP_FILE.sql.gz" | docker exec -i postgres_master psql -U postgres myapp

# 3. 恢复Redis
echo "恢复Redis..."
docker cp "$BACKUP_DIR/redis_$BACKUP_FILE.rdb" redis_master:/data/dump.rdb
docker restart redis_master

# 4. 恢复应用数据
echo "恢复应用数据..."
docker run --rm \
  -v myapp_uploads:/data \
  -v $BACKUP_DIR:/backup \
  alpine tar xzf /backup/uploads_$BACKUP_FILE.tar.gz -C /data

# 5. 启动服务
echo "启动服务..."
docker-compose up -d

echo "恢复完成: $BACKUP_FILE"

监控告警系统

# monitoring/alertmanager.yml
global:
  smtp_smarthost: 'smtp.gmail.com:587'
  smtp_from: 'alerts@yourdomain.com'
  smtp_auth_username: 'alerts@yourdomain.com'
  smtp_auth_password: 'your-app-password'

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'web.hook'

receivers:
- name: 'web.hook'
  email_configs:
  - to: 'admin@yourdomain.com'
    subject: '🚨 {{ .GroupLabels.alertname }} Alert'
    body: |
      {{ range .Alerts }}
      Alert: {{ .Annotations.summary }}
      Description: {{ .Annotations.description }}
      {{ end }}
  
  webhook_configs:
  - url: 'http://localhost:3001/alerts'
    send_resolved: true
# monitoring/alert-rules.yml
groups:
- name: docker-alerts
  rules:
  - alert: ContainerDown
    expr: up == 0
    for: 30s
    labels:
      severity: critical
    annotations:
      summary: "Container {{ $labels.instance }} is down"
      description: "Container has been down for more than 30 seconds"

  - alert: HighMemoryUsage
    expr: (container_memory_usage_bytes / container_spec_memory_limit_bytes) * 100 > 90
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High memory usage on {{ $labels.name }}"
      description: "Memory usage is above 90%"

  - alert: HighCPUUsage
    expr: rate(container_cpu_usage_seconds_total[5m]) * 100 > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High CPU usage on {{ $labels.name }}"
      description: "CPU usage is above 80%"

实际项目案例:电商系统部署

让我们用一个完整的电商系统来演示Docker的实际应用:

项目架构

电商系统架构:
├── 前端 (React + Nginx)
├── API网关 (Kong)
├── 用户服务 (Node.js)
├── 商品服务 (Python Flask)
├── 订单服务 (Java Spring Boot)
├── 支付服务 (Go)
├── 数据库 (PostgreSQL + Redis + MongoDB)
├── 消息队列 (RabbitMQ)
├── 搜索引擎 (Elasticsearch)
└── 监控系统 (Prometheus + Grafana)

完整部署配置

# docker-compose.prod.yml
version: '3.8'

services:
  # API网关
  kong:
    image: kong:latest
    environment:
      KONG_DATABASE: "off"
      KONG_DECLARATIVE_CONFIG: /kong/declarative/kong.yml
      KONG_PROXY_ACCESS_LOG: /dev/stdout
      KONG_ADMIN_ACCESS_LOG: /dev/stdout
      KONG_PROXY_ERROR_LOG: /dev/stderr
      KONG_ADMIN_ERROR_LOG: /dev/stderr
      KONG_ADMIN_LISTEN: 0.0.0.0:8001
    volumes:
      - ./kong/kong.yml:/kong/declarative/kong.yml
    ports:
      - "80:8000"
      - "443:8443"
      - "8001:8001"
    networks:
      - kong-net

  # 前端应用
  frontend:
    build:
      context: ./frontend
      dockerfile: Dockerfile.prod
    volumes:
      - ./frontend/nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      - kong
    networks:
      - kong-net
    deploy:
      replicas: 2

  # 用户服务
  user-service:
    build: ./services/user-service
    environment:
      NODE_ENV: production
      DATABASE_URL: postgresql://postgres:${POSTGRES_PASSWORD}@postgres:5432/users
      REDIS_URL: redis://redis:6379
      JWT_SECRET: ${JWT_SECRET}
    depends_on:
      - postgres
      - redis
    networks:
      - backend-net
      - kong-net
    deploy:
      replicas: 3

  # 商品服务
  product-service:
    build: ./services/product-service
    environment:
      FLASK_ENV: production
      DATABASE_URL: postgresql://postgres:${POSTGRES_PASSWORD}@postgres:5432/products
      ELASTICSEARCH_URL: http://elasticsearch:9200
    depends_on:
      - postgres
      - elasticsearch
    networks:
      - backend-net
      - kong-net
    deploy:
      replicas: 2

  # 订单服务
  order-service:
    build: ./services/order-service
    environment:
      SPRING_PROFILES_ACTIVE: production
      SPRING_DATASOURCE_URL: jdbc:postgresql://postgres:5432/orders
      SPRING_DATASOURCE_USERNAME: postgres
      SPRING_DATASOURCE_PASSWORD: ${POSTGRES_PASSWORD}
      RABBITMQ_HOST: rabbitmq
    depends_on:
      - postgres
      - rabbitmq
    networks:
      - backend-net
      - kong-net
    deploy:
      replicas: 2

  # 支付服务
  payment-service:
    build: ./services/payment-service
    environment:
      GO_ENV: production
      MONGO_URI: mongodb://mongo:27017/payments
      RABBITMQ_URL: amqp://guest:guest@rabbitmq:5672/
    depends_on:
      - mongo
      - rabbitmq
    networks:
      - backend-net
      - kong-net

  # PostgreSQL主数据库
  postgres:
    image: postgres:14
    environment:
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_MULTIPLE_DATABASES: users,products,orders
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./postgres/init-multiple-databases.sh:/docker-entrypoint-initdb.d/init-multiple-databases.sh
    networks:
      - backend-net

  # Redis缓存
  redis:
    image: redis:7-alpine
    command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD}
    volumes:
      - redis_data:/data
    networks:
      - backend-net

  # MongoDB
  mongo:
    image: mongo:5
    environment:
      MONGO_INITDB_ROOT_USERNAME: root
      MONGO_INITDB_ROOT_PASSWORD: ${MONGO_PASSWORD}
    volumes:
      - mongo_data:/data/db
    networks:
      - backend-net

  # RabbitMQ消息队列
  rabbitmq:
    image: rabbitmq:3-management
    environment:
      RABBITMQ_DEFAULT_USER: admin
      RABBITMQ_DEFAULT_PASS: ${RABBITMQ_PASSWORD}
    volumes:
      - rabbitmq_data:/var/lib/rabbitmq
    ports:
      - "15672:15672"  # 管理界面
    networks:
      - backend-net

  # Elasticsearch搜索引擎
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.15.0
    environment:
      - discovery.type=single-node
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    volumes:
      - elasticsearch_data:/usr/share/elasticsearch/data
    networks:
      - backend-net

  # Prometheus监控
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    ports:
      - "9090:9090"
    networks:
      - monitoring-net

  # Grafana仪表板
  grafana:
    image: grafana/grafana:latest
    environment:
      GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}
    volumes:
      - grafana_data:/var/lib/grafana
      - ./monitoring/grafana:/etc/grafana/provisioning
    ports:
      - "3001:3000"
    networks:
      - monitoring-net

volumes:
  postgres_data:
  redis_data:
  mongo_data:
  rabbitmq_data:
  elasticsearch_data:
  prometheus_data:
  grafana_data:

networks:
  kong-net:
    driver: overlay
  backend-net:
    driver: overlay
    internal: true
  monitoring-net:
    driver: overlay

部署脚本

#!/bin/bash
# deploy-ecommerce.sh

set -e

# 配置变量
PROJECT_NAME="ecommerce"
ENVIRONMENT="production"
COMPOSE_FILE="docker-compose.prod.yml"

echo "🚀 开始部署电商系统到生产环境..."

# 1. 检查环境变量
if [ ! -f .env.prod ]; then
    echo "❌ 缺少 .env.prod 文件"
    exit 1
fi

source .env.prod

# 2. 创建备份
echo "📦 创建数据备份..."
./backup.sh

# 3. 拉取最新镜像
echo "📥 拉取最新镜像..."
docker-compose -f $COMPOSE_FILE pull

# 4. 构建服务镜像
echo "🔨 构建服务镜像..."
docker-compose -f $COMPOSE_FILE build --no-cache

# 5. 启动基础服务(数据库等)
echo "🗃️ 启动基础服务..."
docker-compose -f $COMPOSE_FILE up -d postgres redis mongo rabbitmq elasticsearch

# 等待数据库启动
echo "⏳ 等待数据库就绪..."
sleep 30

# 6. 运行数据库迁移
echo "🔄 运行数据库迁移..."
docker-compose -f $COMPOSE_FILE run --rm user-service npm run migrate
docker-compose -f $COMPOSE_FILE run --rm product-service python migrate.py
docker-compose -f $COMPOSE_FILE run --rm order-service java -jar app.jar --migrate

# 7. 启动应用服务
echo "🖥️ 启动应用服务..."
docker-compose -f $COMPOSE_FILE up -d user-service product-service order-service payment-service

# 8. 启动网关和前端
echo "🌐 启动网关和前端..."
docker-compose -f $COMPOSE_FILE up -d kong frontend

# 9. 启动监控服务
echo "📊 启动监控服务..."
docker-compose -f $COMPOSE_FILE up -d prometheus grafana

# 10. 健康检查
echo "🔍 执行健康检查..."
sleep 60

services=("user-service" "product-service" "order-service" "payment-service")
for service in "${services[@]}"; do
    if ! docker-compose -f $COMPOSE_FILE exec $service curl -f http://localhost/health; then
        echo "❌ $service 健康检查失败"
        echo "🔄 开始回滚..."
        ./rollback.sh
        exit 1
    fi
done

# 11. 清理旧镜像
echo "🧹 清理旧镜像..."
docker image prune -f

echo "✅ 部署完成!"
echo "🌍 应用访问地址: https://yourdomain.com"
echo "📊 监控地址: http://yourdomain.com:3001"
echo "🐰 RabbitMQ管理: http://yourdomain.com:15672"

Docker最佳实践总结

镜像构建最佳实践

  1. 使用官方基础镜像

  2. 多阶段构建减少镜像大小

  3. 合理使用.dockerignore

  4. 最小化层数

  5. 使用非root用户

  6. 设置合适的标签

安全最佳实践

  1. 定期更新基础镜像

  2. 扫描镜像漏洞

  3. 使用资源限制

  4. 网络隔离

  5. 密钥管理

  6. 只读文件系统

运维最佳实践

  1. 健康检查

  2. 日志集中管理

  3. 监控告警

  4. 自动备份

  5. 灾难恢复

  6. 版本管理

总结

Docker不是什么高深的技术,核心就是容器化应用,保证环境一致性。掌握了这篇文章的内容,你就能:

  • 快速上手Docker:从安装到基础使用

  • 构建生产级镜像:Dockerfile最佳实践

  • 部署复杂应用:Docker Compose多服务编排

  • 服务器部署:从单机到集群的部署方案

  • 运维监控:日志、监控、备份、恢复

  • 安全加固:容器安全和网络隔离

记住一句话:Docker的价值不在于技术本身,而在于解决了环境一致性问题,让部署变得简单可靠

现在就开始动手实践吧,Docker真的没有想象中那么复杂!