Most of the projects are useless spam 😑

For now I believe reviewing repo age will sort spam significantly.
You can see many spam repos with age below 1 year. It is also very rare to see popular repos under 1 year old.

Of course other factors like stars can be manipulated too, but not as easily as the number of dependencies as it is now. The fact is that most people rarely give stars for no reason. Also creating multiple accounts on GitHub today are prone to suspension :blush:
Yes, tea also needs to whitelist supported repo services (github.com, gitlab.com, etc.) to avoid manipulation with self-hosted.

if the account age of the pepoles starring is took into account.

This is OK. Probably will be straighforward with a little math, let’s call it “preference intensity”.

So if Alice’s account has starred 100 repos, while Bob has only starred 2 repos.
Then Alice’s intensity is 100/100 = 1%, and Bob’s is 50%. So a repo that they both starred in gets 51% intensity.

This approach will also increase the effort for star sellers to run their business. Because the more an account has starred repos, the smaller the star impact.

5 Likes
  • repo age
  • package created at
  • user age
  • file count
  • file size
  • repo read me content

are all decent qualitative and quantitative metrics to collect into an overall spam score classifier. it’s something we’re working on quite a lot, but would love external solutions.

how would y’all go about building a classifier?

2 Likes

Regarding the repo age, it should be relative to the date at the time of registration in tea, not from the current date backwards, if you’re going to use this to sort through the spam that’s already come in.

2 Likes

It would make sense to create additional finters for users to be able easily avoid spammy projects when choose whom to stake. Such as time created, amount of commits, stars/forks of the repository etc.

2 Likes

yes i think its right

I am sure the team is aware of bad actors within the project.

1 Like
  1. Additional filters is a good idea. We’re thinking of expanding it at the start with a spam filter, based on an automated internal classification system we’ve been refining. Not yet applied to registered projects, but that’s coming. Gimme some other ideas here
  2. We’ve got a new feature (note that this isn’t a quest, but a direct response to this forum post) allowing users to report spam projects.

Screenshot 2024-06-04 at 10.02.57 AM

You should see this on the /oss-staking/:project_id pages of all registered projects. Allows us to capture some data about the health of registered projects.

5 Likes

This is a good idea. Individuals creating multiple projects isn’t good for Tea.

1 Like

The fact is spammers can easily submit their project to tea somehow
And if you have REAL one you literally cant because of dumb automatic spam filter )

1 Like

Until recently, before the anti-spam system updates, some projects created solely for ITN with no actual use could be registered. However, this is no longer possible after the recent updates.

If you are referring to your project, handy-ether, it was published recently (2 months ago) and hasn’t seen any updates or upgrades since then. You mentioned in the repo description “package for tea xyz,” and you created another package as a dependent to it. This suggests the project was created only for ITN. However, if you have bigger plans for this project, then keep working on it and you will be able to register it later on testnet or mainnet.

Rewarding OSS contributions involves many challenges and factors that tea considers. The focus should not be on creating a project solely for rewards but on building and maintaining a project for the OSS ecosystem, the rewards will come as a bonus for those contributions.

2 Likes

Anyway you can manually check that was already submitted
I dont mind anyway it’s just kinda funny to see basic next.js draft there when these people dont even know what code is )
They dont even change header

2 Likes

Can’t be spam projects detected by the pattern of the 1st link being git+https (resulting in browser’s failure to open the link).

For example, “git+https://github.com/AganFebro/tea-febro.git”

Or is it a legit pattern?

Both syntaxes are supported, this is not a criterion for identifying a spam project.

Either, means one of, but not both concatenated with a “+”, or am I messing something, or is it just a bug in displaying the link?

The repository.url format is not a criterion in identifying a project as spam or not.
More explicitly, I meant that both repository.url formats ( git+https or https) recommended by npm are supported by tea.

1 Like

its so bad that people take advantages of the most vulnerable. please find a way to correct the span aspect of the project

as an open source maintainer, I want to throw my 5Âą

relevant metrics to determine if a npm project is relevant and is useful to the communiy.

  • GitHub stars counting at least 100.
  • If a project has more than 100 stars, check if it has at least a few thousand downloads.
  • Having a valid repo link that matches package.json
  • A GitHub repo being at least one year old
  • Having more than one version published on npm (very rarely a package has 0 bugs or lack of need of improvements)
  • Package entrypoints should be valid and importable. You can use publint.dev for checking that.
  • Check if a GitHub repo has CONTRIBUTING.md. Most impactful open source projects have a contributing guide and the standard rule is having that file
  • Make sure that npm license field matches GitHub LICENSE file

As a bonus, which I highly recommend:

  • GitHub + npm provenance set up. This verifies that a package was published and signed w/ CI and not an automated script or a human.
  • More than 1 contributor on GitHub. Should have a minor impact so it doesn’t get abused, but at the same time would remove packages and repos created by a script with no actual collaborators.
3 Likes

I need stake for myself.