Context


The problem here is quite obvious — keeping a backup of all posts without losing any information.

The blog system was structured in a custom way and, for that reason, I needed a backup system compatible with my architecture. If you haven't read about LearningSea's infrastructure yet, I recommend the previous posts, where I explain my technical decisions in detail.

My backup system provides me with a series of practical benefits — per-post granularity (backup and restore), versioning, human-readable text, and the ability to restore posts with a single command. It also helps save disk space and reduce costs.

Conventional alternatives


First, I'll explain why I didn't use conventional backup alternatives. The main options were pg_dump and django-dbbackup.

Both would be quite comfortable to use, since pg_dump is a native feature of the database itself, while django-dbbackup already integrates with the framework.

However, both alternatives have important limitations:

  • they perform a full database backup every time the command is run
  • the restore flow is oriented toward a complete database snapshot, with no simple granularity to restore or back up a single post
  • the content has no simple, human-friendly representation, especially in binary formats or .sql dumps
  • disk space accumulates over time

pg_dump has an even bigger problem: it would only copy the database information. However, posts also contain images stored in Django's MEDIA_ROOT folder. This means that, in addition to the database backup, I would still need to manually back up the images folder.

In this scenario, django-dbbackup would be more viable, since it can back up both the database and Django's images folder at the same time, using a single command.

But the limitations mentioned above still exist — and in my context, they are quite relevant.

That said, my backup system solves all of these problems. And the best part is that, most of the time, I don't even need to think about it.

Backup system overview

Every time a post is saved, the system automatically generates a complete snapshot on the filesystem — content, category, tags, and all related images. These snapshots are stored in a hierarchical folder structure and can be restored at any time.

The structure is exactly this:

backups/
└── post-slug/
    ├── post.json
    ├── post.md
    ├── images/
    └── updates/
        └── update_YYYY-MM-DD_HH-MM-SS/
            ├── post.json
            ├── post.md
            └── images/

Versioning and post.json

In the post versioning process, there is an elegant detail: images are not duplicated.

The system lists the images found in the post and includes in subsequent updates only the new images.
The same happens during the restore process — the system walks through all images found in the post's folders and reconstructs its complete state.

The heart of the backup is the post.json file, which represents the complete snapshot of the post. This file contains the exact structure of a post in the database.
The main difference of this system is that the backup is oriented toward the post entity, not the complete state of the database.

Example of a post.json:

{
  "title": "LearningSea: a record of my learning in software engineering",
  "slug": "about-learningsea",
  "excerpt": "In this first post, I introduce LearningSea, a blog where I document my learning, experiments and decisions while developing software.",
  "editor_type": "markdown",
  "content_md": "...",
  "content_html": "...",
  "status": "published",
  "created_at": "2026-03-28T04:16:16.903208+00:00",
  "updated_at": "2026-03-28T04:16:16.991055+00:00",
  "category": "Meta",
  "tags": [
    "LearningSea"
  ],
  "images": []
}

Now, it's worth understanding the pipeline inside this system.

Pipeline


The system has two pipelines: backup and restore.

Backup

Post is saved or updated
    → Django fires the post_save signal
    → post_saved() catches the signal
    → calls backup_post(created=True/False)
        → get_image_paths() scans the post HTML looking for <img> tags
        → if it's a new post:
            → copy_images() copies all found images to backups/<slug>/images/
            → build_post_data() assembles all post fields into a dictionary
            → writes post.json to backups/<slug>/
            → build_frontmatter() builds the YAML header
            → save_markdown_file() writes post.md to backups/<slug>/
        → if it's an update:
            → creates backups/<slug>/updates/update_<timestamp>/
            → get_all_backed_up_images() reads the root post.json + all previous updates' post.json → builds a set with all already saved images
            → filters out already saved images → keeps only the new ones
            → copy_images() copies only the new images to the update folder
            → build_post_data() assembles all post fields into a dictionary
            → writes post.json to the update folder
            → build_frontmatter() builds the YAML header
            → save_markdown_file() writes post.md to the update folder

    → Django fires the m2m_changed signal (tags are saved separately)
    → post_tags_changed() catches the signal
        → checks the post age
        → less than 5 seconds → overwrites the initial backup (created=True) with the correct tags
        → 5 seconds or more → creates a new update entry (created=False)

Restore

python manage.py restore_post backups/<slug> [--update X] [--dry-run]
    → handle() runs
        → if --update was not passed:
            → get_latest_update() scans the updates/ folder → sorts alphabetically → returns the last one
        → resolves the json_path
            → update exists → backups/<slug>/updates/<update>/post.json
            → no update → backups/<slug>/post.json
        → loads post.json into memory
        → collect_all_images() walks root images/ + each update's images/ in chronological order
            → stops at the target update → ignores images from future updates
            → returns a dictionary { filename -> full path }
        → displays a preview of what will be restored
        → if --dry-run → stops here, nothing is written
        → Category.get_or_create() → reuses or creates the category
        → Tag.get_or_create() → reuses or creates each tag
        → Post.update_or_create(slug=...) → updates if exists, creates if not
        → post.tags.set() + post.save()
        → copies all collected images to MEDIA_ROOT/uploads/
    → Done

The pipelines above already show the entire flow covered by the system. However, it's worth understanding some relevant parts a bit more deeply.

Main modules, responsibilities and functions


signals.py → reacts to Django persistence events, triggering image resize and backup pipelines
backup_post.py → performs the complete post snapshot
restore_post.py → reads backup files and reconstructs posts

Below I'll present the modules. Some functions will be omitted to keep the focus only on the main pipeline of the system.

signals.py

Django Signals are an internal event mechanism of the framework that allow functions to be executed automatically when certain actions happen, such as saving a model or changing many-to-many relations.

It's the perfect mechanism for what I need in the backup system — when a post is saved, the backup module is called automatically.

Module function list:

  • resize_uploaded_image
  • post_saved
  • post_tags_changed
@receiver(post_save, sender=Attachment)
def resize_uploaded_image(sender, instance, created, **kwargs):
    if created and instance.file:
        resize_image(instance.file.path)

@receiver(post_save, sender=Post)
def post_saved(sender, instance, created, **kwargs):
    try:
        backup_post(instance, created=created)
    except Exception as e:
        logger.error(f'Post backup failed for {instance.slug}: {e}')

@receiver(m2m_changed, sender=Post.tags.through)
def post_tags_changed(sender, instance, action, **kwargs):
    ...
  • resize_uploaded_image() → listens to post_save from the Attachment model (Summernote) — when an image is saved, calls an auxiliary image resizing module (which uses Pillow)
  • post_saved() → listens to post_save from the Post model — on post creation or update, calls backup_post()
  • post_tags_changed() → listens to m2m_changed from the Post.tags model — correctly synchronizes tags related to the post

backup_post.py

The responsibility here is to perform the complete backup of a post.

The backup_post() function acts as an orchestrator, deciding what to save and where to save it — the other functions execute each step.
The entire implementation of each step lives in the individual functions of this module.

Module function list:

  • get_image_paths
  • copy_images
  • get_all_backed_up_images
  • build_post_data
  • build_frontmatter
  • save_markdown_file
  • backup_post
def backup_post(post, created=False, content_md=None):
    backup_root = settings.BACKUP_ROOT
    post_dir = os.path.join(backup_root, post.slug)
    os.makedirs(post_dir, exist_ok=True)

    if not content_md and post.editor_type == 'markdown':
        content_md = post.content_markdown

    image_paths = get_image_paths(post.content)

    if created:
        image_filenames = copy_images(image_paths, post_dir)
        data = build_post_data(post, image_filenames, content_md)
        json_path = os.path.join(post_dir, 'post.json')
        with open(json_path, 'w', encoding='utf-8') as f:
            json.dump(data, f, ensure_ascii=False, indent=2)
        if content_md:
            save_markdown_file(post_dir, post, content_md)
    else:
        timestamp = datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
        update_dir = os.path.join(post_dir, 'updates', f'update_{timestamp}')
        os.makedirs(update_dir, exist_ok=True)

        already_backed_up = get_all_backed_up_images(post_dir)

        new_image_paths = [
            p for p in image_paths
            if os.path.basename(p) not in already_backed_up
        ]
        image_filenames = copy_images(new_image_paths, update_dir) if new_image_paths else []

        data = build_post_data(post, image_filenames, content_md)
        json_path = os.path.join(update_dir, 'post.json')
        with open(json_path, 'w', encoding='utf-8') as f:
            json.dump(data, f, ensure_ascii=False, indent=2)
        if content_md:
            save_markdown_file(update_dir, post, content_md)
  • get_image_paths() → the post HTML is scanned with regex to extract the image paths
  • copy_images() → copies all images related to the post from MEDIA_ROOT to the post's backup folder
  • get_all_backed_up_images() → reads the root post.json and all previous updates' post.json to build a complete set of already saved images, ensuring they are not saved twice
  • build_post_data() → assembles all post fields into a dictionary format, which is what becomes the post.json
  • build_frontmatter() → writes the YAML header used in the .md file
  • save_markdown_file() → writes the post.md file to disk
  • backup_post() → orchestrator of the backup system

restore_post.py

Django Management Commands allow creating custom commands executed via manage.py.
In the backup system, they are used in the restore pipeline, allowing posts to be reconstructed directly from the terminal while reusing all of Django's infrastructure.

Module function list:

  • get_latest_update
  • collect_all_images

  • Command (class)

    • add_arguments
    • handle
def get_latest_update(backup_path):
    updates_dir = os.path.join(backup_path, 'updates')
    if not os.path.exists(updates_dir):
        return None
    updates = sorted(os.listdir(updates_dir))
    return updates[-1] if updates else None

def collect_all_images(backup_path, target_update=None):
    images = {}

    root_images_dir = os.path.join(backup_path, 'images')
    if os.path.exists(root_images_dir):
        for filename in os.listdir(root_images_dir):
            images[filename] = os.path.join(root_images_dir, filename)

    updates_dir = os.path.join(backup_path, 'updates')
    if os.path.exists(updates_dir):
        for update_folder in sorted(os.listdir(updates_dir)):
            update_images_dir = os.path.join(updates_dir, update_folder, 'images')
            if os.path.exists(update_images_dir):
                for filename in os.listdir(update_images_dir):
                    images[filename] = os.path.join(update_images_dir, filename)

            if target_update and update_folder == target_update:
                break

    return images
...

class Command(BaseCommand):
    def add_arguments(self, parser):
        parser.add_argument('backup_path', type=str)
        parser.add_argument('--update', type=str)
        parser.add_argument('--dry-run', action='store_true')

    def handle(self, *args, **options):
        if not update:
            update = get_latest_update(backup_path)

        if update:
            json_path = os.path.join(backup_path, 'updates', update, 'post.json')
        else:
            json_path = os.path.join(backup_path, 'post.json')

        with open(json_path, 'r', encoding='utf-8') as f:
            data = json.load(f)

        all_images = collect_all_images(backup_path, target_update=update)

        self.stdout.write(f'Restoring: {data["title"]}')
        ...

        if dry_run:
            return

        category, _ = Category.objects.get_or_create(name=data['category'])
        tag, _ = Tag.objects.get_or_create(name=tag_name)
        post, created = Post.objects.update_or_create(slug=data['slug'], defaults={...})
        post.tags.set(tags)

        for filename, src in all_images.items():
            shutil.copy2(src, dst)
  • get_latest_update() → scans the updates folder of a post and returns the most recent one (used when a restore does not specify any update)
  • collect_all_images() → walks the complete image history — from the root backup to the specified update — and restores all of them

  • Command (class)

    • add_arguments() → adds arguments that can be used in the CLI for this command
    • handle() → orchestrates the entire restore sequence. Allows specifying CLI arguments such as --update and --dry-run. Reconstructs the post in the database and the images on disk

Conclusion


The blog can be rebuilt and the configurations can be redone. What cannot be recovered is what I wrote.
The backup system exists to protect exactly that.

A 20GB VPS forced me to build something efficient out of necessity. The result was a system that stores only what changed, only when it changed — which, coincidentally, also ended up being the smartest approach, regardless of the constraint.

Besides that, I wanted backups that are readable, versionable and easy to restore in any environment. My system delivers exactly that.