My Coding Bot Stopped Repeating Itself After I Added Hindsight Memory
"Did it seriously just do that?" I leaned forward as our coding mentor CodeMentor AI is a coding practice web app with one key difference from The memory layer is powered by Hindsight, The app has 5 modules: a code editor for practice, a mistake memory LeetCode doesn't know you failed binary search
shalini mk
"Did it seriously just do that?" I leaned forward as our coding mentor
recommended the exact problem I kept failing โ not because I told it to,
but because it remembered my last four sessions and noticed the pattern
before I did.
What We Built
CodeMentor AI is a coding practice web app with one key difference from
every other platform: it remembers you. Not just your score โ your actual
mistake patterns, your weak topics, your solving speed by language, across
every single session.
The memory layer is powered by Hindsight,
a persistent agent memory system by Vectorize. The LLM is Groq running
qwen/qwen3-32b. The frontend is React with Monaco Editor โ the same
editor used in VS Code.
The app has 5 modules: a code editor for practice, a mistake memory
tracker, an AI mentor chat, a personalized learning path generator,
and a progress analytics dashboard. Everything is wired through
Hindsight's retain() and recall() functions.
The Problem With Every Other Coding Platform
LeetCode doesn't know you failed binary search three times this week.
HackerRank doesn't know you always mess up recursion base cases.
Every single session starts from zero.
So the "personalized" recommendations are just topic filters. There's
no agent that actually learned from watching you code. You repeat the
same mistakes because nothing is tracking the pattern.
We wanted to fix that.
How Hindsight Memory Changes Everything
Every action in CodeMentor retains a memory to
Hindsight's agent memory system:
// When a student fails a problem
await hindsight.retain({
type: "mistake_pattern",
user: "Arun",
pattern: "off-by-one error",
language: "Python",
frequency: 3,
problems_affected: ["two-sum", "binary-search", "sliding-window"],
timestamp: new Date().toISOString()
})
Before every AI response, the mentor recalls from memory:
// Recall before answering
const memories = await hindsight.recall(
"what mistakes does Arun keep making in Python"
)
// Groq receives recalled memories as context
const response = await groq.chat({
messages: [{
role: "system",
content: `You are CodeMentor AI. Here is what you remember
about this student: ${memories}
Use this to give specific, personalized advice.`
}, {
role: "user",
content: userMessage
}]
})
The mentor doesn't guess. It knows.
The Before vs After Moment
This is the demo moment that makes judges stop scrolling.
With Memory OFF, the bot says:
"Hello! What would you like to practice today?"
With Memory ON โ after recalling from Hindsight:
"Hey Arun โ you've hit recursion issues twice this week.
Want to try an easier problem first to build confidence?"
Same LLM. Same prompt. The ONLY difference is the recall() call
pulling real history from Hindsight before the response is generated.
We added a toggle switch in the navbar so you can flip between
the two modes live during a demo. It's the clearest possible way
to show what persistent memory actually does.
What We Stored in Hindsight
We retained four types of memories:
1. Problem attempts โ every try, pass or fail, with error type
2. Mistake patterns โ recurring issues like off-by-one, null pointer,
missing base case
3. Solved problems โ language used, attempts taken, concepts covered
4. Session summaries โ daily snapshots of weak and strong areas
We started by only storing solved problems. That gave us almost nothing
useful for personalization. The breakthrough came when we added mistake
patterns โ suddenly the agent could say things like "you've had this
exact error 3 times" instead of giving generic advice.
What Surprised Us
We expected Hindsight to be useful for recommendations. We didn't
expect it to make the AI sound genuinely caring.
When the agent says "I noticed you haven't practiced dynamic programming
in 5 days" โ it's not hallucinating. It literally recalled that from a
session summary we retained 5 days ago. That grounding makes the
responses feel trustworthy in a way RAG alone never did.
The agent memory features in Vectorize
make this pattern surprisingly easy to implement. retain() and recall()
are the whole API surface. The hard part is deciding what to store.
Lessons Learned
Retain more than you think you need. We started minimal. Adding
mistake patterns and session summaries unlocked 80% of the useful
behaviors.
The recall query is everything. Vague queries return vague memories.
"off-by-one errors in Python arrays this week" returns exactly what
you need. "user mistakes" returns noise.
Show the memory working visibly. We added a Memory Log page that
shows every retain() call ever made. Users trusted the app more when
they could see what it knew about them.
The before/after toggle is your best demo. Nothing explains
persistent memory faster than showing the agent with it OFF vs ON
side by side. Build this into your demo flow.
Don't over-engineer the LLM prompt. The recalled memories do the
heavy lifting. A simple system prompt + recalled context outperformed
our elaborate prompt engineering attempts.
Try It
- ๐ Live App: https://codementor-ai-inky.vercel.app/
- ๐ป GitHub: https://github.com/shalz-collab/codementor-ai
- ๐ง Hindsight: github.com/vectorize-io/hindsight
If you're building any kind of practice or coaching agent, the
retain/recall pattern here is reusable for any domain. The code
is all on GitHub.
Found this useful? Share it!
Read the Full Story
Continue reading on Dev.to
Related Stories
Majority Element
about 2 hours ago
Building a SQL Tokenizer and Formatter From Scratch โ Supporting 6 Dialects
about 2 hours ago
Markdown Knowledge Graph for Humans and Agents
about 2 hours ago

Moving Beyond Disk: How Redis Supercharges Your App Performance
about 2 hours ago