Restoring Hive-Engine Full-Node: How I Fought MongoDB's WiredTiger Throttle and Won
I thought this would be easy.
I’m upgrading my Hive Engine node from "Lite" to "Full." I have the hardware to make this trivial: a Dual Xeon Gold server boasting 80 logical cores and 64GB of RAM, backed by fast SSD storage.
I had a 250GB .archive snapshot ready to go. With that much horsepower, I figured I’d run a standard parallel mongorestore command, grab lunch, and be done by the afternoon.
Instead, I spent the last 24 hours fighting hidden bottlenecks, single-threaded legacy code, and MongoDB’s internal panic triggers.
Here is the autopsy of a restore gone wrong, the "Triple Pincer" method that finally broke the logjam, and the custom tracking script I wrote to keep my sanity.
Attempt 1: The "Sane Defaults" (ETA: 5 Days)
I started with what I thought was an aggressive command. I told mongorestore to use 16 parallel streams and handle decompression on the fly.
# The naive approach
mongorestore \
-j=16 \
--numInsertionWorkersPerCollection=16 \
--bypassDocumentValidation \
--drop \
--gzip \
--archive=hsc_snapshot.archive
I fired it up and watched the logs. It was moving... but barely.
The Diagnosis: A look at htop revealed the problem immediately. One single CPU core was pegged at 100%, while the other 79 were asleep.
The built-in --gzip flag in MongoDB tools is single-threaded. I had a Ferrari engine, and I was feeding it fuel through a coffee stirrer. It was crunching about 2GB per hour. At that rate, I'd be done next Tuesday.
Attempt 2: The pigz Pipe (CPU Unleashed)
If the built-in tool is the bottleneck, bypass it. I aborted the restore and switched to using pigz (Parallel Implementation of GZip). This uses every available core to decompress the stream and pipes raw BSON straight into mongo’s stdin.
# The "Nuclear" Option
pigz -dc hsc_snapshot.archive | mongorestore \
--archive \
-j=16 \
--numInsertionWorkersPerCollection=10 \
--bypassDocumentValidation \
--drop
CPU usage skyrocketed across all 80 cores. The intake pipe was finally wide open. Data started flying into the database.
Until it didn't.
After about 20 minutes of high speed, the restore started stuttering. It would run fast for 10 seconds, then completely stall for 30 seconds. It was faster than Attempt 1, but painfully inconsistent.
The Real Enemy: The WiredTiger "Panic Button"
Why was my powerful server stuttering? It wasn't CPU anymore. I ran mongostat 1 to look under the hood of the database engine.
The "smoking gun" was in the dirty column. It was flatlining at 20%.
Here is what that means: MongoDB’s storage engine, WiredTiger, keeps data in RAM (dirty cache) before writing it to disk. It has safety triggers:
- At 5% dirty, background threads start lazily flushing data to disk.
- At 20% dirty, it hits the panic button. It decides the disk can't keep up. To prevent crashing, it forces the application threads (my restore workers) to stop inserting data and help flush the cache to disk instead.
My 80 cores were decompressing data so fast that the SSD drive couldn't swallow it quick enough. WiredTiger was throttling my CPU to protect the disk.
I tried to tune this live using db.adminCommand to increase the panic to 30%, but it didn't help much. I was stuck.
db.adminCommand({
"setParameter": 1,
"wiredTigerEngineRuntimeConfig": "eviction_dirty_target=20,eviction_dirty_trigger=30"
})
The Final Solution: The "Triple Pincer" Attack
If I couldn't tune the engine to accept one massive stream, I decided to overwhelm it with three smaller ones.
The Hive Engine database is dominated by two massive collections: hsc.chain and hsc.transactions. When restoring linearly, you hit lock contention as dozens of threads fight over the same collection lock while simultaneously fighting eviction threads.
I aborted everything and launched three simultaneous restore processes in separate terminals.

Terminal 1 (The Chain):
pigz -dc hsc_snapshot.archive | mongorestore --archive --nsInclude="hsc.chain" --drop --numInsertionWorkersPerCollection=10 --bypassDocumentValidation
Terminal 2 (The Transactions):
pigz -dc hsc_snapshot.archive | mongorestore --archive --nsInclude="hsc.transactions" --drop --numInsertionWorkersPerCollection=10 --bypassDocumentValidation
Terminal 3 (Everything Else):
pigz -dc hsc_snapshot.archive | mongorestore --archive --nsExclude="hsc.chain" --nsExclude="hsc.transactions" --drop --numInsertionWorkersPerCollection=10 --bypassDocumentValidation
Why this works:
Yes, this reads the 250GB archive from disk three times simultaneously. But SSD read speeds are practically infinite for this workload.
By splitting the job, I broke the collection-level locks. The process restoring chain doesn't care if the transactions process is paused for cache eviction. It smoothed out the I/O pattern.
The Proof:
Looking at my mongostat now (top right pane in screenshot), the insert rate is holding steady in the thousands, but look at the dirty column. It's still hovering at 15%. But we are not at the 20% panic threshold.
The Tooling: Tracking the Invisible
There was one final problem. Because I was piping data via pigz, mongorestore had no idea how big the file was (not that it helps anyway) I had zero progress bars. Restoring hive-engine nodes is a slog and there is nothing to tell you where you are...
Were there's a Linux there is a way.
Everything is a file, you can see where the kernel is reading from memory with the lsof tool. You can get the exact bytes from the filesystem stat, and with those numbers you can do a little bit of math.

So, I wrote track_restore.sh. This script auto-detects the pigz process, finds the open file descriptor using lsof, reads the byte offset from the kernel, and calculates the real-time progress. It works with the normal mongorestore method as well, and would probably be helpful to other Hive-Engine node operators (even light nodes).
You can see it running, keeping me sane while the gigabytes churn.

#!/bin/bash
# Configuration
INTERVAL=5
# AUTO-DETECT: Check for pigz first (fast mode), then mongorestore (slow mode)
PID=$(pgrep -x "pigz" | head -n 1)
PROC_NAME="pigz"
if [ -z "$PID" ]; then
PID=$(pgrep -x "mongorestore" | head -n 1)
PROC_NAME="mongorestore"
fi
if [ -z "$PID" ]; then
echo "Error: Neither pigz nor mongorestore process found."
exit 1
fi
echo "--- Restore Progress Tracker (V3) ---"
echo "Monitoring Process: $PROC_NAME (PID: $PID)"
# Find file and size
ARCHIVE_PATH=$(lsof -p $PID -F n | grep ".archive$" | head -n 1 | cut -c 2-)
if [ -z "$ARCHIVE_PATH" ]; then
echo "Could not auto-detect .archive file. Is the restore running?"
exit 1
else
TOTAL_SIZE=$(stat -c%s "$ARCHIVE_PATH")
echo "Tracking File: $ARCHIVE_PATH"
fi
TOTAL_GB=$(echo "scale=2; $TOTAL_SIZE / 1024 / 1024 / 1024" | bc)
echo "Total Archive Size: $TOTAL_GB GB"
echo "----------------------------------------"
while true; do
# 1. Get Offset
# 2>/dev/null suppresses "lsof: WARNING" noise
RAW_OFFSET=$(lsof -o -p $PID 2>/dev/null | grep ".archive" | awk '{print $7}')
# 2. Safety Check: If empty, assume finished or closing
if [ -z "$RAW_OFFSET" ]; then
echo -e "\n\nRestore finished! (Process closed file)"
break
fi
# 3. Clean the Offset (The Fix)
# Remove '0t' (decimal prefix) and '0x' (hex prefix) to be safe
# Bash handles 0x, but we can treat everything as standard base-10 if we convert hex
if [[ "$RAW_OFFSET" == 0x* ]]; then
# It's hex (mongorestore style)
CURRENT_BYTES=$((RAW_OFFSET))
else
# It's likely 0t (pigz style) or raw number. Strip 0t.
CURRENT_BYTES=$(echo "$RAW_OFFSET" | sed 's/^0t//')
fi
# 4. Math Safety Check
if [ -z "$CURRENT_BYTES" ]; then continue; fi
# 5. Calculate
PERCENT=$(echo "scale=4; ($CURRENT_BYTES / $TOTAL_SIZE) * 100" | bc)
CURRENT_GB=$(echo "scale=2; $CURRENT_BYTES / 1024 / 1024 / 1024" | bc)
# 6. Bar
BAR_WIDTH=50
# Use 0 if PERCENT is empty to avoid crash
INT_PERCENT=$(echo "${PERCENT:-0}" | cut -d'.' -f1)
# Ensure INT_PERCENT is a number
if ! [[ "$INT_PERCENT" =~ ^[0-9]+$ ]]; then INT_PERCENT=0; fi
FILLED=$(($INT_PERCENT * $BAR_WIDTH / 100))
EMPTY=$(($BAR_WIDTH - $FILLED))
BAR=$(printf "%0.s#" $(seq 1 $FILLED))
SPACE=$(printf "%0.s-" $(seq 1 $EMPTY))
printf "\rProgress: [%s%s] %s%% (%s GB / %s GB)" "$BAR" "$SPACE" "$PERCENT" "$CURRENT_GB" "$TOTAL_GB"
sleep $INTERVAL
done
The Lesson
When you throw enterprise-grade hardware at standard-grade tools, things break in weird ways.
Don't trust defaults. Monitor your bottlenecks. And if the database engine tries to throttle you, sometimes the only answer is to hit it from three directions at once.
The node should finally be synced by mid-day.
As always,
Michael Garcia a.k.a. TheCrazyGM
I am enjoying following you along as you are going through these processes from the beginning, so cool that before even sync'ing you have an incredibly useful new gist to add to our project builder!
!PAKX
!PIMP
!PIZZA
View or trade
PAKXtokens.Use !PAKX command if you hold enough balance to call for a @pakx vote on worthy posts! More details available on PAKX Blog.
$PIZZA slices delivered:
@ecoinstant(1/20) tipped @thecrazygm
Please vote for pizza.witness!
You're a masterful coding wizard, my friend, and I very much appreciate reading your adventures. Oh, and no, I never trust defaults. 😁🙏💚✨🤙
Amazing! Mongorestore has been the bane of my existence! On NVMEs a full restore was under 20 hours with some tweaks, but your approach should make that even faster!
My bottleneck is still the SSD. I probably should invest in an NVMe at some point. But I'm poor folk. 😅
Aren't consumer NVME and SSDs almost identical in price with SSDs being slightly cheaper?
Pretty close, but I don't have an M.2 adapter either. (which I think you can get one pretty dirt cheap too) but this SSD was a gift.
Not making excuses, it's just I live week to week, hand to mouth, and barely, if ever, have "extra" to get anything.
Fair enough if you already have it. 0 additional cost is best :)
👀 What FS for the mongoDB please :D
I'm using BTRFS with CoW turned off for the mongod dir. but I wanted the ease of the snapshots for backup purposes, so I don't ever had to do this restore ever again...
Interesting... haven't done that for a while. Have you tried in-memory with ZFS delayed writes?
Honestly, I haven't setup ZFS in, I wanna say about 5 years. Best thing to ever come out of Solaris. But this machine is my "dev machine" so it gets scrambled all the time, probably worth trying.
ZFS has some interesting performance trends on higher memory bandwidth and high core count mobos... especially when you can delay writebacks if that's a SSD/NVMe problem.
So, for a single mobo, a must! For cluster backends, it depends a lot more... as it's a DDoS game (and highly depends on fabric being IP-based or RDMA).
With ASIC cards and SAS stuff, you can overcome some of these problems, but memory ZFS stuff is becoming way interesting. Especially when comparing with PCIe speeds.
That's def worth looking into. Even with every tweak I could throw at it, ran into a physical bottleneck, the SSD can only eat it so fast. It's pegged at 100% util and "still going to take for fucking ever"
Depending on the SSD, you can also tune the queue and block size to perform to its best.
NVMe's are better because of their RAM cache and controllers' further "atomic" stuff, but SAS enterprise cards' stuff does many more atomic commands, which, when used on SAS controllers, drops the latency of high IO/ps quite a lot. So, depends on what you have...
New BIG NVMe's are a different beast, I am still exploring... and they are almost like a computer! It's going to be a game on PCIe over those...
Now I know I need to meet you in real life. Enough proof from your posts. 😉🙃