The Need For The Democratization Of Data Keeps Growing

about 1 year ago

What is the purpose of Web 3.0?

This is a question that can be answered in a number of different ways. Over the last couple years, we tried to delve into it as things evolved.

For the moment, that is a major need that is forming, one which Web 3.0 can fulfill. In fact, I think it is the most crucial role, at least in the medium term, that Web 3.0 can serve.

It is the democratization of data.

Here we are looking at something that is becoming even more important with each passing day. We can see more action taken which stressed how important this is.

YouTuber Sues OpenAi

To start, it matters none what anyone's opinion is of Sam Altman or OpenAI. That isn't the basis of the conversation.

What we are dealing with is the availability of data to anyone who isn't funded with billions of dollars. It is essentially the David vs. Goliath story all over again.

The headline above outlines what appears to be the next step in legal battles.

I am no legal scholar so the particulars are outside my range of knowledge. However, we have an individual who is suing OpenAI claiming copyright infringement based upon the transcripts posted by YouTube being scraped. This is not Google suing but the individual video creator.

Certainly, there is no way to know how this will turn out. In fact, it doesn't matter. OpenAI has access to plenty of resources to tie things up for a long time.

The question always comes back to ABC start up that is working on their own model. What do they do?

From the article, here are the particulars:

“As [OpenAI’s] AI products become more sophisticated through the use of training data sets, they become more valuable to prospective and current users, who purchase subscriptions to access [OpenAI’s] AI products,” the complaint reads. “Much of the material in OpenAI’s training data sets, however, comes from works that were copied by OpenAI without consent, without credit, and without compensation.”

Millette, represented by the law firm Bursor & Fisher, is seeking a jury trial and over $5 million in damages for all YouTube users and creators whose data might’ve been swept up in OpenAI’s training.

To be honest, $5 million sounds like OpenAI could just write a check to make it go away. But again, what about a much smaller company?

This also leads to a much bigger issue: The Internet is being locked down.

We have mentioned how both Reddit and X took steps to stop (reduce) this practice. They obviously are not the only ones.

More than 35% of the world’s top 1,000 websites now block OpenAI’s web crawler, according to data from Originality.AI. And around 25% of data from “high-quality” sources has been restricted from the major datasets used to train AI models, a study by MIT’s Data Provenance Initiative found. Should the current access-blocking trend continue, the research group Epoch AI predicts that developers will run out of data to train generative AI models between 2026 and 2032.

If they are blocking OpenAI's crawler, it is likely that others will face the same fate.

Who does this favor?

Mega Tech Being Protected

While it is a stretch to think the is being done to protect Big Tech, it becomes obvious who is benefitting.

Notice how Google, X, and Meta are not facing these lawsuits. The same is true, at least to my knowledge of Anthropic who is tied to Amazon. These companies have enormous databases that keep growing daily. Each sees comments, clicks, and image/video uploads provided everyday.

Here we see a major advantage.

As for running out of data, my view is the next wave is going to come from robotics. These is embedded AI, operating in the real world. The data generated here will be mammoth, far outpacing what humans do.

There is one problem: who is going to be amassing this data?

If we look at the companies involved, while there are a few start ups, they are being tied to Big Tech. Once again, we are faced with the same situation.

What is ironic about the lawsuit filed by the YouTuber, it was against OpenAI. It appears the fact that Google used this to train Gemini, its large language model, is not worthy of legal action.

Naturally, the individual probably has no case since each user provides Google with the data voluntarily, something the company ends up owning.

Closed Internet

We are seeing further closing down of the Internet before our eyes. This is not the open mecca many of the early developers envisioned.

Most are aware we are dealing with a siloed system. This was evident from the early 2000s when a few companies started to take over.

If data is the new oil, as the saying goes, who is finding the geysers? It isn't the small company that put together a few dollars along with some facing coding. Instead, we are seeing the trillion dollar corporations looking to move even further along.

From what I can derive, the basis of AI is compute and data. If the data online is being locked down at an increasing rate, we can see how the benefits are being directed into the hands of just a few people.

Web 3.0 is the solution.

As we stated on a number of occasions, the solution is for nobody to own the data. This is the essence of a decentralized, permissionless database. When the data written is housed on unrelated computers, anyone can access it.

This is truly the democratization of data.

It gets things out of the hands of Big Tech.

If we think an Internet ruled by the silos of Silicon Valley is bad, just think of what it will look like when they control the essence of many future technologies.

Here is why a new iteration is required. We have the choice of whether it will be with Big Tech at the center of everything or will things be democratized.

The way things are unfolding is becoming clearer with each story like this.

What Is Hive

Posted Using InLeo Alpha

hive-167922 ai youtube lawsuit data openai web3 mancave neoxian proofofbrain

0.000

8 comments

@jammyjtr 56

about 1 year ago

I find this unfair to OpenAI as other Big tech companies aren't facing suits too, if OpenAI which is connected to Microsoft is having this much issue, we shouldn't even bother about small companies joining the race, because there wouldn't be one.

Now that everyone sees the value of data it's seriously on the verge of monopolization, that's a serious case, they literally want to hold the future in their hands by making themselves a must contributing factor to technology. The problem here is, not many people use web 3.0 compared to web 2.0 and based on the current level of awareness I don't see us beating them in the next 3 years.

0.000

@taskmaster4450le 81

about 1 year ago

I think it is rapidly becoming a serious case. It seems, however, that many see it differently. There are billions who feed traditional social media each day.

For most, this is not surprising since they know nothing of Web3. But there are a lot in Web3 who do this also.

0.000

@jammyjtr 56

about 1 year ago

True, I guess we all are addicted to the kitten and skit video posted on there.

0.000

@outwars 72

about 1 year ago

It seems OpenAI is being hit from all sides. I saw a recent article that Elon plans to sue OpenAI. I wouldn't be surprised if Youtube helps fund the youtuber's case, since it can be a way to hit a competitor, and protect 'their' data from others.

0.000

@taskmaster4450le 81

about 1 year ago

Elon has filed another suit I believe, after dropping the last one.

The number of lawsuits is staking up. I think most media entities, at least in the print world have gone after OpenAI legally.

I am not sure about Google. I havent seen anything but that doesnt mean they arent suing too...or perhaps they are just waiting.

0.000

@daniasi 68

about 1 year ago

I am happy how you keep saying things day in day out. Only wishing many eyes could read and digest this facts. Internet is gradually becoming a close canon which needs web3 urgently

0.000

@taskmaster4450le 81

about 1 year ago

It appears that way. Big Tech is doing its best to lock things down.

Nothing we can do but to try and combat it. We have some blockchains that can take in data. Hive is great because it is easy to write text to it.

Others will also factor in of course.

0.000

@daniasi 68

about 1 year ago

you are right friend just that web3 seems to be on a slow motion, this delays means more data for the opposite

0.000