nsight
Here are 17 public repositories matching this topic...
A simple and understandable CUDA kernel for batch-matmul operation
-
Updated
Oct 15, 2018 - Cuda
Repository for Architecture of computers and parallel systems course on VŠB
-
Updated
May 20, 2020 - C++
Fast, reproducible, and portable software development environments
-
Updated
Dec 8, 2021 - Dockerfile
Matrix multiplication example performed with OpenMP, OpenACC, BLAS, cuBLABS, and CUDA
-
Updated
May 31, 2022 - C++
Remote development on HPC clusters with VSCode
-
Updated
Sep 19, 2022 - Jupyter Notebook
University Project for "Computer Architecture" course (MSc Computer Engineering @ University of Pisa). Implementation of a Parallelized Nearest Neighbor Upscaler using CUDA.
-
Updated
Dec 29, 2023 - C
Accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques.
-
Updated
May 23, 2024 - Jupyter Notebook
A reproducible GPU benchmarking lab that compares FP16 vs FP32 training on MNIST using PyTorch, CuPy, and Nsight profiling tools. This project blends performance engineering with cinematic storytelling—featuring NVTX-tagged training loops, fused CuPy kernels, and a profiler-driven README that narrates the GPU’s inner workings frame by frame.
-
Updated
Sep 5, 2025 - Python
🎬 Explore GPU training efficiency with FP32 vs FP16 in this modular lab, utilizing Tensor Core acceleration for deep learning insights.
-
Updated
Sep 6, 2025 - Python
The MNIST classification problem is a fundamental machine learning task that involves recognizing handwritten digits (0- 9) from a dataset of 70,000 grayscale images (28x28 pixels each). It serves as a benchmark for evaluating machine learning models, particularly neural networks.
-
Updated
Sep 12, 2025 - Cuda
🚀 High-performance implementations and benchmarks of SSSP and APSP algorithms (Bellman–Ford, Dijkstra, Floyd–Warshall, Johnson) in Serial, OpenMP, CUDA, and Hybrid CPU+GPU. Includes profiling, speedup plots, and HPC notebooks
-
Updated
Oct 17, 2025 - Jupyter Notebook
Custom PyTorch CUDA kernel implementing optimized ReLU activation with vectorization, performance profiling, and memory analysis on Tesla T4 GPU achieving 75% bandwidth efficiency.
-
Updated
Oct 27, 2025 - Jupyter Notebook
CUDA Samples and Nsight Guided Profiling Samples
-
Updated
Nov 14, 2025 - Cuda
CUDA C++ practice project for RTX 4070 SUPER — explore GPU concurrency, pinned memory, and Nsight profiling. Includes SAXPY and 2D blur kernels to train optimization, stream overlap, and timing analysis for NVIDIA Developer Technology Engineering skillset.
-
Updated
Nov 18, 2025 - Cuda
Improve this page
Add a description, image, and links to the nsight topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the nsight topic, visit your repo's landing page and select "manage topics."