Hive API Nodes, Spiders, and HAF 2.0

avatar

One of the topics Blocktrades talked about in the monthly Hive core meeting, and I put it into my weekly report from yesterday, is the recent slowness of the network and what they identified as a potential cause.

As an almost 6-year-old member of this community, I am used to various forms of growing pains, that's why a little slowness (or even failure to load comments altogether, which makes the problem worse because another call to retrieve the post and the comments need to be made when you refresh) didn't bother me. I barely noticed them.

But someone who barely came from the Web 2 world would surely notice them, and they don't have any roots and friends around yet.

Unfortunately, it's not the first time our Hive's API nodes have been overwhelmed. Now, some spiders were temporarily blocked to ease the situation. I don't know any details about what spiders were blocked and what they were looking for. Usually, these kinds of spiders are made smartly, to not interfere with the regular activity of the websites that are spidered. But I guess some small-scale, focused, spiders are not only impolite but outright rude.


Source

However, if we think about the difference in speed between a centralized database and an API node pulling information from the chain or a database, the spiders might have been well programmed, it's just a matter of how long it takes for them to do their spidering and with how much load to the API nodes.

Right now, after further investigations as a simple user on my part, it seems the slowness only appears in PeakD and hive.blog, among the major front ends. INLEO and Ecency were very fast. But I think it may be an irrelevant test. I believe Ecency has a private full node too, so both INLEO and Ecency could be using the information from their nodes without making any calls to a different API node to receive them. That would make this operation as quick as accessing a local database if I understand things correctly. Of course, their nodes still need to sync with the rest of the network, but for relatively busy front ends (for Hive's size), it's much better than making an API call for every operation of every user.

What I like is that Blocktrades said the new HAF 2.0 has sped up some calls, particularly get account history, which is one of the most resource-intensive calls, in some cases by 100x. Now, the question is, who uses HAF 2.0? Is it used in production already?


Want to check out my collection of posts?

It's a good way to pick what interests you.

Posted Using InLeo Alpha



0
0
0.000
25 comments
avatar

It’s crazy how I don’t know anything about blocktrades
This now feels like a lecture to me
Thanks for the class then

0
0
0.000
avatar

Do you know many things about the engineers who built the engine of your car? 😁

His name is Dan (Notestein), and Blocktrades is his username and the name of his now-defunct exchange. He is a top witness and developer for the Hive core blockchain.

0
0
0.000
avatar

At first I thought that the slowness in the frontends was my dodgy internet connection. Then I watched the latest core developer meeting video and saw that it was spider bots.

I wonder if the spidering is being done by AI companies that are hungry for training data. It would be interesting to know which accounts are being spidered.

0
0
0.000
avatar

Yeah, I am curious to see what spiders were crossing the line.

AIs will remain hungry for more data. People who know more than I do in this field say that in a matter of a few years, they would have consumed all content created by mankind, ever. Both written and audio-video. And the only way for them from that point onward is with synthetic content.

0
0
0.000
avatar

Very interesting about the data consumption.

Perhaps the only way to keep data out of their hands (in the future) will be by encrypting everything. It would work well until quantum computing makes most forms of encryption useless.

0
0
0.000
avatar

It won't work. Just think of all the AI tools we have already integrated and which are fed content to produce results based on them. If content is encrypted, there are two solutions:

  • remove AI completely from Hive (I don't think we want that)
  • decrypt content before feeding it to the AI, which effectively removes the need for encryption for this purpose.
0
0
0.000
avatar

I just meant for content (on the internet in general) that folks may not want the AI(s) to consume.

0
0
0.000
avatar

i've been finding @ecency slow too recently, i keep having to switch nodes

0
0
0.000
avatar
(Edited)

Ok... I guess I was lucky when I tested it a few hours ago. Maybe I was lucky this week that I wasn't so much on Hive due to my sickness. Now I see tools, games, etc. throw errors too.

0
0
0.000
avatar

Thanks for sharing this. I was also puzzeled with the slowness and start rebooting stuff.

0
0
0.000
avatar

Hopefully, things will improve soon.

0
0
0.000
avatar

Usually, these kinds of spiders are made smartly, to not interfere with the regular activity of the websites that are spidered.

It's not always like that... I remember a few years ago spiders almost broke my hosting machine (a powerful dedicated server, not VPS) and they were going around only 1 website and it wasn't a big one... So, they can make problems, but I don't have information on how they mess things up for HIVE this time...

0
0
0.000
avatar

That is true, that not all spiders are made smartly. Generally, we are talking about badly written spiders (or intentionally made to slow down a certain website or service; or to not care about the kind of disruptions it causes), or bad configuration for them by the website/server admin. In Hive's case, I doubt it's the latter.

0
0
0.000
avatar

I did see some slowness, and I thought that it was just my internet, which was also having some problems. Someone I know who recently came back to Hive did encounter some issues. But luckily, they were eventually able to connect. HAF 2.0 sounds nice, and will hopefully be helpful if we get more users.

0
0
0.000
avatar

Yeah, Web3 is not often user-friendly. And since Hive is a lot about users, the middle layers must be built and improved for a better UX.

0
0
0.000
avatar

I have noticed that things have been loading a bit slower in PeakD but it usually solves within a minute or so. It's good to know that they know about the issue and that HAF 2 should solve that.

0
0
0.000
avatar

We are fortunate to be around from a time when we could say "It usually solves within a minute or two". Few would have this kind of patience nowadays. Everything is instant nowadays.

0
0
0.000
avatar
(Edited)

Spiders are interesting creature but recently I watched a video on TV where it bite a woman and cause serious infection to her and from that day they scared me alot.

It's quite good part of transferring data hope for good one

0
0
0.000
avatar

I find real-life spiders kind of creepy too.

But in this case, a spider is a computer program (a bot) that keeps browsing the internet for new and updated pages, usually to index them on search engines. I guess the name comes from the fact that when it starts searching a website and following all the links - if it is well connected - it looks kind of like a spider's web (more or less).

That's the standard use of spiders nowadays. But anyone can create them for whatever they want. One use case more controversial is web scraping, which is copying a full website, if you are not blocked.

0
0
0.000
avatar

Exactly the same thing I noticed too I put up my post a few days ago so it's been a lot of trouble and I still have a lot of problems uploading videos so things will have to be upgraded further because Further, if the market is bullish, there will be a lot of new users on the platform.

0
0
0.000