We Will Not Have A Data Problem

avatar

Data. It is the lifeblood of all AI development.

Some are not theorizing that we will run out of data. This was first projected to occur in 2024, something that is now pushed out to 2027 or 2028 by many estimates.

Are we headed for a data crunch? This is a worthwhile topic to dig into.

In this article we will look at where we are and what is likely to happen.

Data: The Fuel of the Digital Age

We cannot deny the fact that we generate a massive amount of data. This is something that started years ago. The first conversation centered around the cost of storage. Fortunately, the prices of hard drives have consistently been dropping, forming their own "Moore's Law" albeit not quite as fast.

With that problem resolved, the feeding of LLMs, has gotten some concerned. The growth in this arena is mind-numbing, with total GPU compute doubling every 6 months. We can think of this as Moore's Law on steroids.

If compute keeps growing, how are we going to feed it. This is a chart put together by Brian Wong, detailing what he sees as the potential growth of compute used by XAI for Grok.

The 100K H100 are already in place. When we look at the Blackwell series, that was already stated as on order, at least the first 100K. Some of the rest of these might already have orders in. To be honest, this is not something that I follow as closely as he does.

NVIDIA put the specs for Rubin out. next is not something I am familiar with but that doesn't mean it isnt accurate.

How accurate this are is not relevant for our discussion. The important criteria is the fact that we are seeing massive amount of compute going in over the next couple years. We are looking at billions of dollars for these systems.

Of course, this is across the board. We will likely see the same thing with Meta, Google, Amazon, and OpenAi. TSMC is going to be very busy.

Beyond that, we come back to the data question.

Where are we going to get this from.

Data Explosion

The topic of data has to be broken down.

One key to this is what is structured or not. That is what truly gives it value. Of course, we have to start somewhere, thus unstructured is used. However, it has to be manipulated, i.e. labelled, to be of major use.

Fortunately, much of the focus was on text. There is still a lot of video and audio which has not been fed in. Multimodal is something that is relatively new. Over the next few years, this is going to be used to a much larger degree.

Even with this, however, how are we going to have enough to feed these system in 2026 or 2027.

Here is where we could be a massive data explosion.

Real world AI is the next hotbed. This is going to be embedded in robots and different items that move throughout our spaces. This will add another dimension completely.

It will also send the volume of data skyrocketing.

There is a ton of data that we do not collect. This is generated by the entire global population but it never makes it into the Internet. Consider what we see while just sitting on a park bench. That data is imprinted in our minds but we don't upload to the cloud (at least not yet).

Consider if what "seen" with robots through cameras. From the prototypes I saw on video, there are at least 5 cameras on each robot. It is possible there are more. We also have to factor in the number of sensors in throughout the bot also.

All of this will generate a massive amount of data. It is going to create a feedback loop for these entities. A great deal of that compute will be fed data of this nature.

One big advantage is that it can be set up to be structured from the start. Hence, we are dealing with quality data from the beginning as the producers of the robots will know how to design the capture to best integrate into their system.

This is the direction we are heading. Meta trained Llama3.1, their latest model on 16K H100, the largest to date. Their next version is certainly going to top 100K also. We can see this across the board.

Compute is nothing without data. There are a lot of companies out there that want it. Big Tech is going to get it.

My question, as always, what about the start ups? That is something that we have to keep considering.


What Is Hive

Posted Using InLeo Alpha



0
0
0.000
12 comments
avatar

This really got me thinking. With AI advancing so fast, it feels like startups might struggle to keep up with the data demands. And I feel they just have to strive to ensure they are not left behind in the game.

0
0
0.000
avatar

That is true. It is what I have been writing about. That is why something like Hive is crucial. It is the democratization of data. With the internet being progressively locked down by the companies that own the platforms, what are start ups to do.

It is why Hive offers a solution. Nobody owns the data.

0
0
0.000
avatar

Yeah, that's correct. Hive is very different from others online BTC and others who are own by one or two.

0
0
0.000
avatar

Well Bitcoin is mostly a database of financial transactions. It doesnt contain text data in the same way Hive does.

0
0
0.000
avatar

Hmm, that's right, thanks for making me know this and enlightening me the more. You are indeed a pro in this line. I give it to you.🫡🫡

0
0
0.000
avatar

It will be interesting to see how much of the data that the robots collect will be used. Technically, they can already use the different surveillance feed from different sources. Google street view has a lot of data available already. I hope they are able to equip the robots with advanced measuring and collecting tools similar to the mars rover to get data not easily available.

0
0
0.000
avatar

I do not know how many sensors they have in these robots but I am sure there are many. They are becoming very small and dirt cheap.

As the robots move, everything will be recorded. I think it will be a massive amount.

0
0
0.000
avatar

Before, people fed hamsters and canaries, now, we have to think about whether it is worth feeding excess data :) !BEER

0
0
0.000
avatar

I guess there are some skills like data analyst which might be going out of rapid use due to the advancements of Artificial intelligence in our world

0
0
0.000
avatar

My view is there are a lot of jobs that are going to be eliminated. We could see call centers losing a lot of people as conversational #ai takes over.

0
0
0.000
avatar

With the way artificial intelligence is getting all the wave, it actually looks like they are setting the standards for us actually

0
0
0.000