Hive Node Setup for the Smart, the Dumb, and the Lazy.

about 1 year ago (Edited)

Hive consensus node - simple way

Requirements:

Hardware: x86-64, 32GB RAM, 1TB fast storage (SSD / NVMe)
Software: Ubuntu 22.04 LTS

Assumptions:

We act as user hive with uid 1000 and HOME=/home/hive
We use screen for convenience.
We use /home/hive/datadir as a data dir for our node.

Use cases:

Simple, yet versatile configuration that can be used to spawn a node that serves as a:

seed

Take part in a P2P network. By default listen at publicly available TCP port 2001.

witness

Witnesses, a.k.a. block producers play an essential role on Hive. In this case, you don’t want to open webserver ports to the public or enable non-essential plugins such as account_history. Make sure that you set values for witness and private-key.

exchange

Exchanges need to track account history entries for a list of accounts they use for deposits and withdrawals. For that reason such accounts have to be specified in config files (see example entries). Each time you add a new account to be tracked, you have to perform a replay.

personal wallet

You might want to have a node for personal needs to handle your accounts. Configure it just like the exchange, except you will track your own account(s).

basic API

A consensus node has a basic, yet powerful API. It can return useful information about the current state of the blockchain, track the head block, return blocks with get_block API, and broadcast transactions, which might be just good enough to handle some bots or apps.

Prepare directory tree

mkdir -pv ~/datadir/{blockchain,snapshot} ~/bin

Use example config file

wget https://gtg.openhive.network/get/snapshot/exchange/example-exchange-config.ini -O ~/datadir/config.ini

Get `hived` and `cli_wallet` binaries

wget https://gtg.openhive.network/get/bin/hived-1.27.6 -nc -P ~/bin
wget https://gtg.openhive.network/get/bin/cli_wallet-1.27.6 -nc -P ~/bin
chmod u+x ~/bin/{hived,cli_wallet}-1.27.6

Run `hived`

Of course you need to make sure it won’t be killed when you disconnect (use screen, or configure it as a service), make sure that the configuration fits your needs (tracking accounts, bind ports to public interfaces or to localhost, etc.)

~/bin/hived-1.27.6 -d /home/hive/datadir

That’s it.

It will start sync process during which /home/hive/datadir/blockchain/block_log and /home/hive/datadir/blockchain/block_log.artifacts will be created and updated as it will sync and process blocks coming from the Hive p2p network. As the blocks are processed the current state is being saved in the /home/hive/datadir/blockchain/shared_memory.bin file. If you track account history then there’s also /home/hive/datadir/blockchain/account-history-rocksdb-storage which is RocksDB storage with account history data.

Optional steps and improvements

Use tmpfs for shared_memory.bin file

It’s worth mentioning that /home/hive/datadir/blockchain/shared_memory.bin will be heavily accessed for read/write. Placing this file on tmpfs will speed up resync and replay, and will reduce I/O on the storage. The disadvantage is that it will not survive the reboot. You also need to have enough RAM / swap.
To use tmpfs, uncomment this line in config.ini file:

# shared-file-dir = "/run/hive"

And prepare that location for storing shared_memory.bin file:

sudo mkdir /run/hive
sudo chown -Rc hive:hive /run/hive
sudo mount -o remount,size=30G /run

Use existing block_log

If you already have a block_log file you can use it to speed up the process. In such a case place it in ~/datadir/blockchain and use --replay.
You can use a block_log from another instance you run or download from public sources (see: https://gtg.openhive.network/get/blockchain )
You can safely reuse block_log from older versions.

wget https://gtg.openhive.network/get/blockchain/block_log -nc -P ~/datadir/blockchain
wget https://gtg.openhive.network/get/blockchain/block_log.artifacts -nc -P ~/datadir/blockchain

Please note that the block_log is roughly 500GB, downloading it could take a significant amount of time (6-12 hours even with a decent network connection)

Use a snapshot

Snapshot can apply the state of the blockchain that was generated on a different machine. It’s tightly bound to the version that was used to generate it and the exact configuration (used plugins, etc.). Make sure that you have lbzip2 installed (sudo apt install lbzip2). Regular bzip2 will also work, but lbzip2 makes use of all available CPU threads. To use snapshot you also need a block_log that is at least as fresh at snapshot itself.

wget https://gtg.openhive.network/get/snapshot/exchange/latest.tar.bz2 -O - | lbzip2 -dc | tar xvC /home/hive/datadir/snapshot

When using snapshot use --load-snapshot=latest (where the ‘latest’ is the name of the snapshot)

TL;DR: Complete optimized recipe

screen -q # start the screen manager

mkdir -pv ~/datadir/{blockchain,snapshot} ~/bin

sudo mkdir /run/hive
sudo chown -Rc hive:hive /run/hive
sudo mount -o remount,size=30G /run

wget https://gtg.openhive.network/get/blockchain/block_log -nc -P ~/datadir/blockchain
wget https://gtg.openhive.network/get/blockchain/block_log.artifacts -nc -P ~/datadir/blockchain
wget https://gtg.openhive.network/get/snapshot/exchange/latest.tar.bz2 -O - | lbzip2 -dc | tar xvC /home/hive/datadir/snapshot
wget https://gtg.openhive.network/get/bin/hived-1.27.6 -nc -P ~/bin
wget https://gtg.openhive.network/get/bin/cli_wallet-1.27.6 -nc -P ~/bin
wget https://gtg.openhive.network/get/snapshot/exchange/example-exchange-config.ini -O ~/datadir/config.ini

sed -i '/^# shared-file-dir/s/^# //' ~/datadir/config.ini # enable tmpfs location
chmod u+x ~/bin/{hived,cli_wallet}-1.27.6

~/bin/hived-1.27.6 -d /home/hive/datadir --load-snapshot=latest

Upgrading from previous version

If your instance is already configured this way, then upgrade is very easy:

rm -rf /home/hived/datadir/snapshot/latest
wget https://gtg.openhive.network/get/bin/hived-1.27.6 -nc -P ~/bin
wget https://gtg.openhive.network/get/bin/cli_wallet-1.27.6 -nc -P ~/bin
chmod u+x ~/bin/{hived,cli_wallet}-1.27.6
wget https://gtg.openhive.network/get/snapshot/exchange/latest.tar.bz2 -O - | lbzip2 -dc | tar xvC /home/hive/datadir/snapshot

Stop current instance and start with new binary:

~/bin/hived-1.27.6 -d /home/hive/datadir --load-snapshot=latest

Estimated times:

Sync (from scratch) - 36h
Replay (if you already have a block_log) - 18h
Load from snapshot (if you already have a block_log) - 1h

Congratulations, you have your Hive node running!

hivepressure dev witness-category hive

0.000

32 comments

@bitcoinflood 80

about 1 year ago

Alright this looks rather easy and now I'm going to be honest I'm seriously thinking about launching one and learning from people such as yourself. I used to run a number of nodes years ago for other blockchains. Thanks for the deets!

0.000

@mipiano 80

about 1 year ago

Congratulations, you have your Hive node running!

Thank you, this was very easy and quick - I didn't even wait for 36 hours and this congratulation arrived 😜

0.000

@chibuzorwisdom 53

about 1 year ago

Thank you so much for this step by step guide. I have longed to get the one that is written like this.

This looks easy and smart, I have reblogged it for reference purposes.

I will revisit it once I purchase a server for the set-up.
Thank you once again.

0.000

@miosha 64

about 1 year ago

sudo mount -o remount,size=30G /run

Why 30G though? Isn't it enough to be of the size of shared_memory.bin? In that case setting both the size of shm and ram-disk to 22G should still have decent margin (4-5G).

downloading it could take a significant amount of time (6-12 hours even with a decent network connection)

12 hours is only a bit less than syncing from scratch through p2p, so downloading in that case is not a viable solution 😁

/home/hive/datadir/blockchain/block_log and /home/hive/datadir/blockchain/block_log.artifacts will be created

So, I guess the version supporting split block log is the next one, right?

0.000

@gtg 76

about 1 year ago

Why 30G though? Isn't it enough to be of the size of shared_memory.bin? In that case setting both the size of shm and ram-disk to 22G should still have decent margin (4-5G).

/run that I use in my way of setting things up is a system-wide place to store various run-time data, so I can't use all of it. I use higher values because I keep same setup scripts for other nodes, and for my fully featured account history node it's already:
du -csh /run/hive/shared_memory.bin:

22G /run/hive/shared_memory.bin

But that doesn't matter much, the configured size limit doesn't pre-allocate RAM. It simply sets an upper boundary on how much space can be used.

12 hours is only a bit less than syncing from scratch through p2p, so downloading in that case is not a viable solution 😁

I'm not that sure if it's just a bit less, one of my recent sync tests (6 weeks ago) took me 42 hours. I'm afraid that you might be too optimistic about sync speed in real life conditions.

So, I guess the version supporting split block log is the next one, right?

Yes! :-) I can't wait for that. Unfortunately being a most used block_log provider I have to wait for global adoption. Or do I? :-) Once it's officially released I will switch :-D

0.000

@miosha 64

about 1 year ago

I'm afraid that you might be too optimistic about sync speed in real life conditions.

I just started syncing on latest develop, so I guess we will know soon enough 😄

0.000

@miosha 64

about 1 year ago

Damn you 😡 It is still going. You were right and I remembered it wrong. I've dug out a 15 months old results of full sync and it was running over 37 hours up to 72M+. Compared to that current version appears to be slightly faster, but still couple times slower than what I thought it would be.

To be honest it smells like a bug (or more optimistically - as an optimization opportunity). There are couple of hiccups when node is not receiving blocks fast enough, but for the most part block processing is reported at close to 100% time. On the other hand computer seems to be sleeping, using around single core only, which is weird, since decomposing signatures, that used to make sync 7 times slower than replay, since HF26 is supposedly done on multiple threads and preemptively, as soon as block arrives, so I'd expect at least some bursts of higher CPU activity. Maybe I should use some config option for that?

It would be nice to have a comparison on the same machine: pure replay vs replay with full validation vs sync.

0.000

@gtg 76

about 1 year ago (Edited)

I love that one:

It's not a bug, it's an optimization opportunity!

It would be nice to have a comparison on the same machine: pure replay vs replay with full validation vs sync.

~~I will I've added that to my long long TODO list.~~

Nah, I'll start it right away.

Starting replay from scratch (using existing, fresh block_log and block_log.artifacts):

~/bin/hived-1.27.6 -d /home/hive/datadir --set-benchmark-interval=100000 --exit-before-sync --replay 2>&1 | ts -s "%s" | tee -ai benchmark-hived-v1.27.6-replay.log

Once it's finished, I will repeat it with added --validate-during-replay.
At that time, I will be able to compare it to my old sync run (performed on the same machine up to block 79795165 with version 1.27.4). But I guess it would be great to take the opportunity to compare the exact same version up to the same block height, which would be around block 87300000.

Extrapolating from the previous sync run, which completed 79.7M in 33 hours, hopefully we can complete the sync on that machine in 36.5 hours.
Seems like I will be able to present results by the end of next week.

And just to be clear, for those tests I'm using minimalistic (except for the witness plugin) configuration:

backtrace = yes

plugin = witness

shared-file-dir = "/run/hive"
shared-file-size = 28G

flush-state-interval = 0

p2p-endpoint = 0.0.0.0:2001

0.000

@blocktrades 80

about 1 year ago

Signatures are checked ahead of time in separate threads, and sufficient number of threads are default allocated.

Whenever you see block processing at 100%, then the bottleneck is the single-core speed of your system (it's processing operations and updating state).

0.000

@miosha 64

about 1 year ago (Edited)

The results are in:

revision: 4921cb8c4abe093fa173ebfb9340a94ddf5ace7a
same config in both runs (no AH or other plugins that add a lot of data, just witness and APIs, including wallet bridge)
in both runs 87310000 blocks were processed (actually slightly more, with replay covering around 10 blocks extra that previous sync run added to block log while in live sync)
replay with validation (from start up to Performance report (total).) - 124225649 ms which is 34.5 hours, avg. block processing time (from Performance report at block) is 1.423 ms/block
sync (from start up to entering live mode) - 143988777 ms which is 40 hours, avg. block processing time (from Syncing Blockchain) is 1.649 ms/block

I'm curious how @gtg measurements will look in comparison.

Sync to replay ratio shoots up the most in areas of low blockchain activity, which is understandable, since small blocks are processed faster than they can be acquired from network, but in other areas sync is still 10-20% slower.

And the likely reason I remembered sync as faster than that is due to difference in computer speed - my home computer appears to be over 60% faster than the one I was running above experiments on, which would mean it should almost fit the sync inside 24 hours.

0.000

@gtg 76

about 1 year ago

For now I have results for first 50M blocks:

50M blocks	Real time	last 100k real time	last 100k cpu time	parallel speedup
Replay	`6:32:45`	`43.466s`	`61.132s`	`x1.4064`
Replay + Validate	`11:03:00`	`84.337s`	`395.575s`	`x4.6904`
Resync	`14:31:33`	`103.266s`	`182.288s`	`x1.7652`

I just counted last 100k block times (cpu / real) so it's not a great measurement. I can have better numbers once I complete those runs. But it seems that replay with validation can somehow make a better use of multiple threads than validation during resync.

0.000

@blocktrades 80

about 1 year ago (Edited)

It might be the state undo logic slowing down blockchain processing in a sequential manner (this computation is probably skipped for replay+validate). But I doubt there is a way to disable it to check that, short of modifying the code for the test.

Probably we should modify the code dealing with checkpoints to skip undo logic up to the checkpoint. This would allow us to confirm if it is the bottleneck, and it would also give us a speedup when checkpoints are set if it turns out to be the bottleneck.

0.000

@andablackwidow 69

about 1 year ago

It should be easy to test - just cut out two lines with session in database::apply_block_extended (I'm actually assuming that out of order blocks won't reach that routine during sync, but if they do, it would be a source of slowdown).

I'd be surprised if undo sessions were the problem. They are relatively slow and worthy of optimization, but in relation to simple transactions, mostly custom_jsons, so their performance is significant when there is many of them, like during block production, reapplication of pending or in extreme stress tests with colony+queen. During sync we only have one session per block.

0.000

@blocktrades 80

about 1 year ago (Edited)

Yes, your assumption is correct, blocks are strictly processed in order during sync, the P2P code ensures this. If it's easy to test, let me know what you find out: I guess work with @gandalf so that the test is performed on the same machine as previous measurements.

0.000

@hivebuzz 74

about 1 year ago

Congratulations @gtg! Your post has been a top performer on the Hive blockchain and you have been rewarded with this rare badge

	Post with the highest payout of the day.

_{You can view your badges on your board and compare yourself to others in the Ranking}
_{If you no longer want to receive notifications, reply to this comment with the word STOP}

0.000

@themarkymark 81

about 1 year ago (Edited)

I recommend using a named session with screen as well as logging the session.

screen -S witness -L -Logfile witness.log

You can then use my monitorwitness script to know if it falls behind,

https://github.com/officiallymarky/monitorwitness

0.000

@bpcvoter1 -12

about 1 year ago

LOL

It seems like THEMARKYMARK is dealing with some personal drama and mental issues, which might be causing them to spam us and others with baby pics, GIFs, and questionable content he's been doing the same for years. The fun doesn't stop there, as this seems to be a recurring theme with multiple online personas. The real question is, do we want to trust these drama-magnets with our hard-earned funds or any significant influence on Hive?

from Imgflip Meme Generator

On Hive a significant issue exists with automatic upvotes consistently rewarding the same individuals day in and day out

We want to address the issue of downvoting. It has caused pain to many people, and we want to make sure it doesn't happen again reply to @jacobtothe

On Hive a significant issue exists with automatic upvotes consistently rewarding the same individuals day in and day out

We hope that those who genuinely care about Hive will reconsider their actions, as continuing down this path could inadvertently harm innocent users who are unaware of these issues

lol the Marky mark keeps dreaming

Marks dream

https://www.publish0x.com/the-dark-side-of-hive/the-marky-mark-marcus-its-clear-youre-deeply-engaged-in-farm-xmjodol

WE PRAY FOR YOU

Posted using Bilpcoin

from Imgflip Meme Generator

https://www.bilpcoin.com/undefined/@bpcvoter1/lol-the-marky-mark-keeps-dreaming

lol the Marky mark keeps dreaming

themarkymark (80)in FreeCompliments • 2 days ago
No.

I downvote spammers, you are a spammer, you get no monies. It’s very simple.

In the meantime go get some help. You obviously have some issues going on.

2 days ago in FreeCompliments by themarkymark (80)$0.00
Reply 4
Sort: Trending
[-]bpcvoter1 (-4)(1) yesterday · Will be hidden due to low rating
GET HELP
$0.00Reply Edit
[-]themarkymark (80) 4 hours ago
I have a personal chef, maid, and a captain for my yacht. What other help do I need?

$0.00Reply

https://hive.blog/hive-140084/@themarkymark/re-bpcvoter1-sg5l3x

@gogreenbuddy THIS MAN IS A TOP HIVE FARMER AND SCAMMER SOME OF HIS OTHER ACCOUNTS @THEMARKYMARK @BUILDAWHALE @IPROMOTE AND 100s more

0.000

@rafzat 71

about 1 year ago

This now looks very easy
Kudos to you for your hard work!

0.000

@mr-press 40

about 1 year ago

I have zero knowledge about this. lol

0.000

@thelogicaldude 74

about 1 year ago

Thanks, been looking into running a witness node. Now I just need the machine.

0.000

@unbiasedwriter 74

about 1 year ago

Thank you for the instructions :) Very useful indeed if I ever want to setup a Hive node :)

0.000

@sacra97 75

about 1 year ago

Thank you for the instructions

0.000

@nonameslefttouse 77

about 1 year ago

Thanks. I followed all the steps. Once they finally got the flames under control and I was allowed to gather whatever I could salvage, people wanted answers. So I quickly pulled this post up and started reading it out loud. Even they understood.

0.000

@gtg 76

about 1 year ago

Modern day Prometheus! ;-)

0.000

@ajorundon 63

about 1 year ago

I'm glad people be doing all this but me. I do not want to blow my heads up trying to comprehend this post. It really amazing some commenters find it easy and useful. Kudos

0.000

@wanderingmoon 63

about 1 year ago

This is very helpful and hopefully will help more people to become witness's/node operators. !LUV

0.000

@xplosive 72

about 1 year ago

Please note that the block_log is roughly 500GB, downloading it could take a significant amount of time (6-12 hours even with a decent network connection)

Seeing this, maybe it is better to have more than 1 TB storage. People often store and run multiple things on their servers.

0.000

@gtg 76

about 1 year ago

Sure, but that depends on use case. People who run hived node should know what they are doing. For example running a witness node assumes that nothig else runs on the same machine.

0.000

@hivebuzz 74

about 1 year ago

Congratulations @gtg! Your post has been a top performer on the Hive blockchain and you have been rewarded with this rare badge

	Post with the highest payout of the week.

_{You can view your badges on your board and compare yourself to others in the Ranking}
_{If you no longer want to receive notifications, reply to this comment with the word STOP}

0.000

@magicmonk 79

about 1 year ago

Genius article - I should learn to follow it one day

0.000

@apshamilton 72

10 months ago (Edited)

I'm trying to get a Hive witness node up and running on a local machine (Intel i5 8400 64G RAM 1 Tb NVMe & 500Gb NVMe). I downloaded @gtg's blocklog and ran replay. It created a blocklog index that expanded to 462Gb and ate all the space on the 1Tb NVMe.

Is this supposed to happen? Can I move the index to the 2nd 500Gb NVMe which is empty?

I'm using @someguy123's Hive in a Box

0.000

@gtg 76

10 months ago

There's no block_log.index file anymore, it was replaced by block_log.artifacts a while ago,
and regardless the name, it shouldn't be that big. Once you downloaded block_log from my site you could do the same with artifacts (it would save some time for regeneration if you have fast network, otherwise it might not be worth the hassle).

My instruction above is for this kind of deployment, I never used "in a box" stuff, so I don't know it's quirks and specifics.

0.000

Hive Node Setup for the Smart, the Dumb, and the Lazy.

Requirements:

Assumptions:

Use cases:

seed

witness

exchange

personal wallet

basic API

Prepare directory tree

Use example config file

Get hived and cli_wallet binaries

Run hived

Optional steps and improvements

Use tmpfs for shared_memory.bin file

Use existing block_log

Use a snapshot

TL;DR: Complete optimized recipe

Upgrading from previous version

Estimated times:

Congratulations, you have your Hive node running!

It's not a bug, it's an optimization opportunity!

Get `hived` and `cli_wallet` binaries

Run `hived`