Most of the projects are useless spam 😑

For now I believe reviewing repo age will sort spam significantly.
You can see many spam repos with age below 1 year. It is also very rare to see popular repos under 1 year old.

Of course other factors like stars can be manipulated too, but not as easily as the number of dependencies as it is now. The fact is that most people rarely give stars for no reason. Also creating multiple accounts on GitHub today are prone to suspension :blush:
Yes, tea also needs to whitelist supported repo services (github.com, gitlab.com, etc.) to avoid manipulation with self-hosted.

if the account age of the pepoles starring is took into account.

This is OK. Probably will be straighforward with a little math, let’s call it “preference intensity”.

So if Alice’s account has starred 100 repos, while Bob has only starred 2 repos.
Then Alice’s intensity is 100/100 = 1%, and Bob’s is 50%. So a repo that they both starred in gets 51% intensity.

This approach will also increase the effort for star sellers to run their business. Because the more an account has starred repos, the smaller the star impact.

5 Likes
  • repo age
  • package created at
  • user age
  • file count
  • file size
  • repo read me content

are all decent qualitative and quantitative metrics to collect into an overall spam score classifier. it’s something we’re working on quite a lot, but would love external solutions.

how would y’all go about building a classifier?

1 Like

Regarding the repo age, it should be relative to the date at the time of registration in tea, not from the current date backwards, if you’re going to use this to sort through the spam that’s already come in.

2 Likes

It would make sense to create additional finters for users to be able easily avoid spammy projects when choose whom to stake. Such as time created, amount of commits, stars/forks of the repository etc.

1 Like

yes i think its right

I am sure the team is aware of bad actors within the project.