UgannA Siyabasa — FastText Sinhala Embedding Model 🇱🇰

license

cc-by-nc-4.0

language

si

pipeline_tag

feature-extraction

library_name

fasttext

UgannA Siyabasa — FastText Sinhala Embedding Model 🇱🇰

UgannA Siyabasa (උගන්නැ සියබස) is the first public FastText embedding model released by Remeinium Corp. The name comes from Kumaratunga Munidasa’s timeless quote:

“උගන්නැ සියබස – මත් වන්නැ එහි රසයෙන්”
Learn Sinhala – be intoxicated with its beauty.

Just as Munidasa envisioned nurturing the Sinhala language, this model represents teaching it to machines.

📌 Key Features

Type: FastText (official library)
Vector size: 100 dimensions
File size: ~1.56GB
Training data: 6.2GB processed Sinhala text
Performance:
- Similar-word retrieval accuracy: 0.90+ (tested)
- Outperforms cc.si.300.bin baseline (~0.76)

🔧 Usage

Hugging Face Model

You can directly load the model from Hugging Face:
👉 Hugging Face Model Page

import fasttext

# Load the model from Hugging Face (after downloading)
model = fasttext.load_model("UgannA_Siyabasa.bin")

# Get vector for a word
vector = model.get_word_vector("අම්මා")

# Get nearest neighbors
neighbors = model.get_nearest_neighbors("අම්මා", k=10)
print(neighbors)

GitHub Repository

We also provide code samples and utilities on GitHub:
👉 Remeinium GitHub

📂 Training Data

Processed and cleaned training corpus: ~6.2GB
Preprocessing: tokenization, normalization, deduplication

🗜️ License

This model is released under CC BY-NC 4.0 (non-commercial use).
🔗 For commercial usage, please contact: support@remeinium.com

⚠️ Limitations

Vocabulary coverage limited to training dataset.
May reflect cultural/linguistic biases from sources.
Optimized for Sinhala; not multilingual (future versions will expand).

🤝 Collaboration

You are welcome to:

Use this model for research & personal projects
Share improvements, benchmarks, or downstream applications

📧 Contact us: support@remeinium.com

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
LICENSE		LICENSE
README.md		README.md
test_model.ipynb		test_model.ipynb
test_model.py		test_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

UgannA Siyabasa — FastText Sinhala Embedding Model 🇱🇰

📌 Key Features

🔧 Usage

Hugging Face Model

GitHub Repository

📂 Training Data

🗜️ License

⚠️ Limitations

🤝 Collaboration

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

UgannA Siyabasa — FastText Sinhala Embedding Model 🇱🇰

📌 Key Features

🔧 Usage

Hugging Face Model

GitHub Repository

📂 Training Data

🗜️ License

⚠️ Limitations

🤝 Collaboration

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages