Post Not Found | Ujjwol Thapa

Why Run AI Locally?

As a DevOps Engineer in Nepal, I've experimented with every AI coding assistant out there. Claude, Copilot, Cursor — they're great, but they all share one problem: your code leaves your machine. For teams handling sensitive infrastructure, that's a non-starter.

Enter Ollama — a free, open-source tool that runs AI models entirely on your laptop. No API keys, no subscriptions, no data leaving your device. And with Google's Gemma 4:e2b model, you get a surprisingly capable coding assistant for free.

In this guide, I'll walk you through setting up Ollama + Gemma 4:e2b and integrating it with your workflow — all at zero cost.

What You'll Need

Requirement	Details
RAM	8 GB minimum, 16 GB recommended
Storage	~4 GB for the Gemma 4:e2b model
OS	Linux, macOS, or Windows (WSL2)
Internet	Only for the initial download

Most modern laptops in Nepal — even budget ones — can handle this. I run it on my development machine alongside Docker containers without issues.

Step 1: Install Ollama

Ollama makes installation dead simple. Open your terminal and run:

bash

# Linux/macOS
curl -fsSL https://ollama.com/install.sh | sh

# Windows (WSL2)
# Install WSL2 first, then run the Linux command above

Verify the installation:

bash

ollama --version

You should see the Ollama version. The service runs in the background automatically.

Step 2: Download the Gemma 4:e2b Model

Gemma 4 is Google's latest open-weight model. The e2b variant is optimized for coding tasks — perfect for a local development assistant.

bash

ollama pull gemma4:e2b

This downloads roughly 2–4 GB. On Nepal's average fiber connection (50–100 Mbps), it takes about 5–10 minutes.

Verify it's ready:

bash

ollama list

You should see gemma4:e2b in the list.

Step 3: Chat with Your Local AI

Test the model right away:

bash

ollama run gemma4:e2b "Write a Dockerfile for a Node.js Express app"

You'll get a response generated entirely on your machine. No cloud. No API calls. No cost.

Interactive Chat Mode

For an ongoing conversation:

bash

ollama run gemma4:e2b

Then just type your questions. Exit with /bye.

Step 4: Use Ollama as an API Server

Ollama runs a local API server at http://localhost:11434. This means any tool that supports OpenAI-compatible APIs can use your local model.

bash

# Test the API
curl http://localhost:11434/api/generate -d '{
  "model": "gemma4:e2b",
  "prompt": "Explain what Kubernetes does in 2 sentences",
  "stream": false
}'

Integrating with VS Code / IDE

Several VS Code extensions support Ollama out of the box:

Install Continue or Twinny from the VS Code marketplace
Set the provider to Ollama
Set the model to gemma4:e2b
Start coding with AI assistance

These extensions give you:

Code completion
Inline explanations
Refactoring suggestions
Test generation

All running locally, all free.

Step 5: Optimize for Your Hardware

If Gemma 4:e2b feels slow, here are some tuning tips:

Limit GPU/CPU Usage

bash

# Set number of layers to offload to GPU (if available)
OLLAMA_NUM_GPU_LAYERS=20 ollama serve

# Limit CPU threads
OLLAMA_NUM_PARALLEL=2 ollama serve

Use a Smaller Model

If gemma4:e2b is too heavy, try lighter alternatives:

bash

ollama pull gemma4:1b    # 1 billion parameters — runs on anything
ollama pull qwen2.5:3b   # Great for coding, moderate RAM usage

Why This Matters for Developers in Nepal

Let me be honest — most AI coding tools are subscription-based, and paying $20/month for GitHub Copilot or $200/month for Claude Pro adds up fast when you're earning in NPR.

Running Ollama locally gives you:

Zero ongoing costs — download once, use forever
Privacy — your code never leaves your machine
Offline access — works without internet (great for Nepal's occasional connectivity issues)
No rate limits — ask as many questions as you want
Full control — swap models, tweak parameters, fine-tune if needed

For DevOps Engineers, system administrators, and developers in Nepal, this is a game-changer. You get AI-assisted coding without the monthly bill.

Limitations — Be Realistic

Gemma 4:e2b is impressive but not magical:

It won't match Claude Sonnet or GPT-4 on complex reasoning
It can hallucinate on obscure topics
Context window is smaller than cloud models

But for everyday coding — writing scripts, explaining code, generating boilerplate, debugging — it's genuinely useful. And it's free.

What's Next?

Once you're comfortable with Ollama and Gemma 4, explore:

Multi-model setups — Run different models for different tasks
Custom system prompts — Tailor responses for your workflow
CI/CD integration — Use Ollama in pipelines for automated code review
Docker deployment — Containerize Ollama for team access

The local AI ecosystem is evolving fast. Getting started now puts you ahead of the curve.

Building AI-powered infrastructure or need help setting up local tooling for your team in Nepal? Let's talk.

How to Set Up Ollama AI with Claude Code Using Gemma 4:e2b (100% Free)

Why Run AI Locally?

What You'll Need

Step 1: Install Ollama

Step 2: Download the Gemma 4:e2b Model

Step 3: Chat with Your Local AI

Interactive Chat Mode

Step 4: Use Ollama as an API Server

Integrating with VS Code / IDE

Step 5: Optimize for Your Hardware

Limit GPU/CPU Usage

Use a Smaller Model

Why This Matters for Developers in Nepal

Limitations — Be Realistic

What's Next?