8000
Skip to content

A Python library for intelligent schema inference and validation across various data formats.

Notifications You must be signed in to change notification settings

m0nirul/datasense

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

datasense

A Python library for intelligent schema inference and validation across various data formats.

Features

  • Infer schema from CSV, JSON, Parquet, and relational database tables.
  • Generate human-readable schema definitions (e.g., YAML, JSON, Python dict).
  • Validate new data rows or files against a loaded schema, reporting type mismatches, missing fields, or constraint violations.
  • Suggest data type coercions and transformations based on inferred patterns.
  • Support for custom validation rules and data cleaning hooks.

Quick Start

Installation

First, install the library using pip:

pip install datasense

Inferring a Schema

Let's infer a schema from a simple CSV file. Create a file named data.csv:

id,name,age,is_active
1,Alice,30,true
2,Bob,24,false
3,Charlie,35,true

Now, use datasense to infer its schema:

import json
from datasense.core import infer_schema

# For demonstration, let's assume 'infer_schema' can read directly from a path
# In a real scenario, you might pass a file object or specific format handler.
csv_file_path = "data.csv"

# Infer schema
inferred_schema = infer_schema(csv_file_path, format='csv')

# Print the inferred schema (e.g., as a pretty JSON string)
print(json.dumps(inferred_schema, indent=2))

# Expected output (simplified example):
# {
#   "fields": [
#     {"name": "id", "type": "integer"},
#     {"name": "name", "type": "string"},
#     {"name": "age", "type": "integer"},
#     {"name": "is_active", "type": "boolean"}
#   ]
# }

This example demonstrates the basic usage of datasense to quickly infer a data schema. Refer to the documentation for more advanced features like custom rules, validation, and different data formats.

About

A Python library for intelligent schema inference and validation across various data formats.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0