Java Linguistics Software

Linguistics Java Clear Filters

Browse free open source Java Linguistics Software and projects below. Use the toggles on the left to filter open source Java Linguistics Software by OS, license, language, programming language, and project status.

Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
AI-generated apps that pass security review
Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.

Try Retool free
1

Autshumato TMX Integrator

It's a utility application for updating and integrating translation memories, created by the Autshumato ITE, over a network. Licensed under the TMate Open Source License and free to download and be used by anyone.

Downloads: 0 This Week

Last Update: 2018-04-24
See Project
2

BANAL

BANAL - Banal And Not A Language. A prototyping notation compatible with Java and C# (via the largest possible common footprint between the two).

Downloads: 0 This Week

Last Update: 2013-04-12
See Project
3

BANNER Named Entity Recognition System

BANNER is a named entity recognition system intended primarily for biomedical text. It uses conditional random fields as the primary recognition engine and includes a wide survey of the best techniques described in recent literature.

Downloads: 0 This Week

Last Update: 2015-07-30
See Project
4

Bermuda Text-to-Speech

This project includes basic NLP and DSP techniques for Text-to-Speech

See TTS demo at: http://rslp.racai.ro/index.php?page=tts This is an entirely written in JAVA project which includes a set of tools and methods designed to enable Multilingual Text-to-Speech (TTS) synthesis. We currently support English and Romanian but we will soon train more models and make them available for download. If you want to read more about our other NLP and TTS tools check out http://nlptools.racai.ro.

Downloads: 0 This Week

Last Update: 2014-03-24
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
5

BioContext

Software for extraction of biomedical information from literature

Downloads: 0 This Week

Last Update: 2012-02-12
See Project
6

BioEvent

This is a Java-based project for complex event extraction from text and co-reference resolution. Currently the code can read BioNLP shared task format (http://2011.bionlp-st.org/) and i2b2 Natural Language Processing for Clinical Data shared task format (https://www.i2b2.org/NLP/DataSets/Main.php). Event extraction includes finding events and the parameters for an event in a text. The method is based on SVM but other ML algorithms can be adopted. The method details are explained in the following paper: Ehsan Emadzadeh, Azadeh Nikfarjam, and Graciela Gonzalez. 2011. Double Layered Learning for Biological Event Extraction from Text. In Proceedings of the BioNLP 2011 Workshop Companion Volume for Shared Task, Portland, Oregon, June. Association for Computational Linguistic

Downloads: 0 This Week

Last Update: 2013-04-25
See Project
7

BioLemmatizer

Lemmatization tool for morphological analysis of biomedical literature

The BioLemmatizer is a domain-specific lemmatization tool for the morphological analysis of biomedical literature. It is tailored to the biological domain through integration of several published lexical resources related to molecular biology. It focuses on the inflectional morphology of English, including the plural form of nouns, the conjugations of verbs, and the comparative and superlative form of adjectives and adverbs. README: https://sourceforge.net/projects/biolemmatizer/files/ The BioLemmatizer 1.2 release adds an optional functionality to normalize British English spellings into American English spellings and then retrieve corresponding lemmas. If you use the BioLemmatizer to support academic research, please cite the following paper: Haibin Liu, Tom Christiansen, William A Baumgartner Jr, and Karin Verspoor BioLemmatizer: a lemmatization tool for morphological processing of biomedical text Journal of Biomedical Semantics 2012, 3:3.

Downloads: 0 This Week

Last Update: 2013-10-23
See Project
8

Board Game Language

Board Game Language (BGL, pronounced "bagel") is a natural language syntax programming language for first-time programmers. It uses board games as a metaphor for programming concepts, with the goal of teaching users the foundations of programming.

Downloads: 0 This Week

Last Update: 2014-06-23
See Project
9

BuckTagger

User-assisted tool for Arabic stem entry to Buckwalter Morpho Analyzer

Using rules written in a Drools decision table, BuckTagger determines the correct Buckwalter Tag based on morphological properties of the input, automatically extracted or given by the user. At the moment, BuckTagger is not complete; it can only handle input that is: - Uninflected - In lexical form, i.e., no clitics or affixes. - A Perfect or Imperfect Verb - Preferably the first and before-last letters are diacritized/vocalized. The interface is in Arabic. See the README for more details. There is much room for development. Feel free to comment.

Downloads: 0 This Week

Last Update: 2014-05-22
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
10

CHALICE

Connecting Historical Authorities with Links, Contexts and Entities. CHALICE is a historic placename gazetteer for the UK, published as Linked Data and linked to other widely-used sources of placename reference information on the semantic web.

Downloads: 0 This Week

Last Update: 2013-04-26
See Project
11

CLEiM

Cross Lingual Education in Medicine

CLEiM (Cross Lingual Education in Medicine) is an opensource version of an Intelligent System which extract concepts from medical texts and provides qualified information. It integrates information from various sources. This system has been developed by the Intelligent System Group GSI (http://www.esi.uem.es/gsi/) at UEM University. We do NER (Named Entity Recognition) based on GATE platform. The installation is simple, you can use it as a Web application. It has been tested under apache-tomcat. The original system has been successfully used to carry out active learning activities with medical students. However, it could be interesting in much more knowledge fields.

Downloads: 0 This Week

Last Update: 2014-09-10
See Project
12

Chaski

Distributed phrase-based machine translation training tool based on Hadoop.

Downloads: 0 This Week

Last Update: 2013-04-26
See Project
13

CoSyne Integrated Prototype

Multilingual Content Synchronization with Wikis: CoSyne is a Research and Technological Development project co-funded by the European Union. Details: http://cosyne.eu

Downloads: 0 This Week

Last Update: 2013-04-29
See Project
14

Colloquium QDA

A free and open source qualitative ethnographic interview coding tool.

Colloquium QDA is a tool for custom coding and analyzing qualitative ethnographic interviews. To run, make sure you first have JRE 8 or later installed (http://www.oracle.com/technetwork/java/javase/downloads/). Colloquium QDA is an open source cross-platform Java Swing app utilizing an embedded Java DB with Lucene integrated search.

Downloads: 0 This Week

Last Update: 2017-01-23
See Project
15

Communication Supporting System

Downloads: 0 This Week

Last Update: 2015-03-26
See Project
16

Communication Supporting System

Downloads: 0 This Week

Last Update: 2013-05-29
See Project
17

ConTextKit

ConTextKit is a Java-based implementation of Wendy Chapman's ConText algorithm for annotating the context of medical documents, specifically the negation, temporality, and experiencer.

Downloads: 0 This Week

Last Update: 2014-06-24
See Project
18

CorpSe

CORPSE (CORPus SEarch) is a powerful search engine written in Java. The aim is to provide an efficient implementation of a word level inverted index search with various cool functions that can be used on very large corpora.

1 Review

Downloads: 0 This Week

Last Update: 2013-04-26
See Project
19

Corpus Toolkit

A text management tool for linguistic purposes...

Downloads: 0 This Week

Last Update: 2017-11-23
See Project
20

Cunei Machine Translation Platform

Cunei is a data-driven machine translation system that builds dynamic, statistical models based on instances of known translations found in a corpus.

1 Review

Downloads: 0 This Week

Last Update: 2013-06-05
See Project
21

DArtikel!

Learn the articles of German words.

Learn words in german that you know at your own pace. With this system you can add the words you knew in a day and then do exercises with them. Written by: Jovanny Pablo Cruz Gómez. Computer Engineering Student. IPN, ESIME Culhuacan, Mexico City.

Downloads: 0 This Week

Last Update: 2013-11-07
See Project
22

DCTFinder

Extract title and creation time from web page.

Web pages do not offer reliable metadata concerning their creation date and time. However, getting the document creation time is a necessary step for allowing to apply temporal normalization systems to web pages. DCTFinder is a system that parses a web page and extracts from its content the title and the creation date of this web page. DCTFinder combines heuristic title detection, supervised learning with Conditional Random Fields (CRFs) for document date extraction, and rule-based creation time recognition. DCTFinder is released under CeCILL free software license agreement. The system is described in the following paper (see 'Files' section): Xavier Tannier. "Extracting News Web Page Creation Time with DCTFinder". Proceedings of the 9th Language Resources and Evaluation Conference. Reykjavik, Iceland.

Downloads: 0 This Week

Last Update: 2016-10-21
See Project
23

DawNLITE

DawNLITE is a Natural-Language-based Image Transmoding Engine. The software transforms an image to a video as recorded by a virtual camera panning and zooming over the image, following a natural language text description of the image.

Downloads: 0 This Week

Last Update: 2013-04-18
See Project
24

Dendrarium

System do pielęgnacji składnikowych drzew składniowych

Dendrarium służy do wybierania i weryfikacji składnikowych drzew składniowych generowanych przez parser Świgra. System jest użytkowany w Instytucie Podstaw Informatyki PAN do tworzenia banku drzew składniowych dla języka polskiego Składnica.

Downloads: 0 This Week

Last Update: 2014-02-18
See Project
25

Discriminative Language Editor

Discriminative language editor based on ontologies

Text editor in Java that is able to detect discriminative expressions while the user is typing. When the internal ontology-based analyzer detects a potential discriminative expression the user is advised by underscoring the related words in the text. A descriptive message about the issue is also shown to the user when the cursor is placed over the potential discriminative expression.

Downloads: 0 This Week

Last Update: 2016-10-30
See Project