Why Run AI Locally?

As a DevOps Engineer in Nepal, I've experimented with every AI coding assistant out there. Claude, Copilot, Cursor — they're great, but they all share one problem: your code leaves your machine. For teams handling sensitive infrastructure, that's a non-starter.

Enter Ollama — a free, open-source tool that runs AI models entirely on your laptop. No API keys, no subscriptions, no data leaving your device. And with Google's Gemma 4:e2b model, you get a surprisingly capable coding assistant for free.

In this guide, I'll walk you through setting up Ollama + Gemma 4:e2b and integrating it with your workflow — all at zero cost.

What You'll Need

RequirementDetails
RAM8 GB minimum, 16 GB recommended
Storage~4 GB for the Gemma 4:e2b model
OSLinux, macOS, or Windows (WSL2)
InternetOnly for the initial download

Most modern laptops in Nepal — even budget ones — can handle this. I run it on my development machine alongside Docker containers without issues.

Step 1: Install Ollama

Ollama makes installation dead simple. Open your terminal and run:

bash
# Linux/macOS
curl -fsSL https://ollama.com/install.sh | sh

# Windows (WSL2)
# Install WSL2 first, then run the Linux command above

Verify the installation:

bash
ollama --version

You should see the Ollama version. The service runs in the background automatically.

Step 2: Download the Gemma 4:e2b Model

Gemma 4 is Google's latest open-weight model. The e2b variant is optimized for coding tasks — perfect for a local development assistant.

bash
ollama pull gemma4:e2b

This downloads roughly 2–4 GB. On Nepal's average fiber connection (50–100 Mbps), it takes about 5–10 minutes.

Verify it's ready:

bash
ollama list

You should see gemma4:e2b in the list.

Step 3: Chat with Your Local AI

Test the model right away:

bash
ollama run gemma4:e2b "Write a Dockerfile for a Node.js Express app"

You'll get a response generated entirely on your machine. No cloud. No API calls. No cost.

Interactive Chat Mode

For an ongoing conversation:

bash
ollama run gemma4:e2b

Then just type your questions. Exit with /bye.

Step 4: Use Ollama as an API Server

Ollama runs a local API server at http://localhost:11434. This means any tool that supports OpenAI-compatible APIs can use your local model.

bash
# Test the API
curl http://localhost:11434/api/generate -d '{
  "model": "gemma4:e2b",
  "prompt": "Explain what Kubernetes does in 2 sentences",
  "stream": false
}'

Integrating with VS Code / IDE

Several VS Code extensions support Ollama out of the box:

  1. Install Continue or Twinny from the VS Code marketplace
  2. Set the provider to Ollama
  3. Set the model to gemma4:e2b
  4. Start coding with AI assistance

These extensions give you:

All running locally, all free.

Step 5: Optimize for Your Hardware

If Gemma 4:e2b feels slow, here are some tuning tips:

Limit GPU/CPU Usage

bash
# Set number of layers to offload to GPU (if available)
OLLAMA_NUM_GPU_LAYERS=20 ollama serve

# Limit CPU threads
OLLAMA_NUM_PARALLEL=2 ollama serve

Use a Smaller Model

If gemma4:e2b is too heavy, try lighter alternatives:

bash
ollama pull gemma4:1b    # 1 billion parameters — runs on anything
ollama pull qwen2.5:3b   # Great for coding, moderate RAM usage

Why This Matters for Developers in Nepal

Let me be honest — most AI coding tools are subscription-based, and paying $20/month for GitHub Copilot or $200/month for Claude Pro adds up fast when you're earning in NPR.

Running Ollama locally gives you:

  1. Zero ongoing costs — download once, use forever
  2. Privacy — your code never leaves your machine
  3. Offline access — works without internet (great for Nepal's occasional connectivity issues)
  4. No rate limits — ask as many questions as you want
  5. Full control — swap models, tweak parameters, fine-tune if needed

For DevOps Engineers, system administrators, and developers in Nepal, this is a game-changer. You get AI-assisted coding without the monthly bill.

Limitations — Be Realistic

Gemma 4:e2b is impressive but not magical:

But for everyday coding — writing scripts, explaining code, generating boilerplate, debugging — it's genuinely useful. And it's free.

What's Next?

Once you're comfortable with Ollama and Gemma 4, explore:

The local AI ecosystem is evolving fast. Getting started now puts you ahead of the curve.


Building AI-powered infrastructure or need help setting up local tooling for your team in Nepal? Let's talk.