Hey guys

So, I've been building the AI layer inside Little Moments. I think it’s worth talking about because the AI layer is what failed with Dateful, and because I think most people, myself included at first, get AI features wrong.

The instinct is generally to go bigger. Build the chatbot. Build the agent system. And cherry it off with a sparkle icon and "AI-powered" tag on the landing page and call it a day.

For Little Moments, I’m doing none of that. The word "AI" doesn't appear anywhere in the app, yet I’m using it in a few ways for both the free and paid versions.

Free forever might be risky, but I’ll explain why I think it’s necessary in more detail now. TLDR: make it invisible in free, and magical in paid.

Also, I’m itching to make this available to all of you. Just waiting for approval….

Project: Little Moments
Diff Entry: #008
First Commit: ~4 weeks ago
TestFlight Users: 25
Revenue: $0
Target: TBD — product first
Running spend: ~$33 (domain + Supabase) — excludes Claude & Cursor subs

Free AI helps to “Dig Deeper”

The first AI feature in the app is called Dig Deeper. It's the only AI feature available to every user for free with no limit, and it does one thing: help you turn a rough moment into a richer memory.

Say you capture a moment with a few sentences about something that happened, maybe spoken loosly into your phone while walking to coffee. It's raw and that’s nice.

But Dig Deeper sits there as an option. Tap it, and Ellie reads what you wrote, asks one follow-up question like a curious friend about a detail you didn't mention, and gives you an enhanced preview with that detail woven in.

The key word there from a UX perspective is one.

Because v1 was not that.

v1 was a 3-question interrogation

My first version didn't even have the constrained daily prompt mechanic. You'd open a blank journal page, type your own heading, your own body of text, and then hit a "Dig Deeper" button that launched a full-screen chat modal. The AI would read your moment and ask three questions. Three rounds minimum before you'd see anything back.

And the output was better, sure. It had more detail and context to work with so it was a better story/memory.

But my friend Mitch said this: "I love it. But it's too much work. What if I only have one detail? I don't want to answer three questions in a row. I'm happy with where my moment is."

Then I did it again myself and removed my bias, and I went from loving the chat to hating the interrogation. I saw what he meant.

I'd built something to reduce friction in having to think and type too much, and help to bring more depth to a memory. But I was getting in the way of people's real moments. Not every moment needs to be a novel. Some are just a sentence and that's enough.

v2 became progressive, not forced

So I rebuilt it. No blank pages. No titles to fill in (AI generates them because coming up with a title can feel like work and pressure). No forced multi-step process. It’s a native part of the process.

Now you capture a moment, see a preview, and there's a single "Dig Deeper" button. Tap it. One question. One enhanced preview. Done.

Want to keep going? Tap again. Another layer. Another detail pulled from your actual memory—what someone was wearing, what the light looked like, what you felt but didn't think to say.

But if you're happy after one round? You're done. Close it. Move on with your day.

The litmus test to “Is AI being helpful here?” is that it feels like progressive input, not forced extraction. Each run adds incremental value and it never feels like work to the user.

In the first pass I built AI for the output quality I wanted, not the experience the person needed. An important takeaway.

People who love going deep, they go three, four rounds sometimes. People who just want a quick polish? One tap and out. I’ve been in both these camps depending on my mood and both work well and feel genuinely additive to my core journaling and logging habit.

It went from feeling like an interview to feeling like a friend sitting across from you, listening to your story for the first time, asking the one question that makes you remember something you almost forgot. It makes Capturing moments better and that’s the free loop of the app

Which is why this AI feature despite not being free to me, is going to be free for the user forever….

———————————————————————————

Every Dig Deeper session, every tap, every point where someone goes one round or five, me and PostHog are tracking it.

AI features force us to ask different questions. Not just "did people use it", but did they like what it produced, did it feel right, did it make them come back?

Right now I'm watching what percentage of moments hit at least one Dig Deeper round. That's the question that tells me if it's working. And PostHog lets me just ask it. Plain English. No SQL written by hand. No charts in a folder I'll forget about in a week.

p.s right now my number is very low, but at least I know now

When you're shipping AI features, the cost of asking a question about your data needs to be zero. Otherwise you just won't ask. And then you're guessing if your AI is good. That's how you ship something bad and don't know for months.

1M free events a month. Real free tier, not a trial gate. Go set up PostHog this week for your next project—it’s so quick to install and it will save yourself months of guessing.

———————————————————————————

Why Dig Deeper stays free

I could and did think about gating this behind premium…like X many Dig Deepers for free.

It costs me some money every time someone uses it despite being very constrained in terms of what it can do (like no open chatting which is where costs can run). It’s just an Anthropic API call, Claude Sonnet, running analysis on what you wrote and generating a follow-up and enhanced version.

But Dig Deeper reinforces the core habit. Capture a moment, feel good about it, come back tomorrow. If the AI helps someone turn three rough sentences into something that actually moves them when they read it back that person is coming back.

You can't charge for the thing that makes people form the habit.

Well, you can. But I think that’s bad business.

Research across 20+ AI product teams found that the smallest, almost invisible AI features like pre-filling a name, auto-transforming text, tiny bits of magic—often have a bigger impact than the flashy chatbots. The best AI features don't feel like AI features. They just make the thing you already do feel better.

That's what I want my free AI to be. Not a headline feature marketed as AI. Just a little background tool that makes the core action more rewarding.

What I’m betting on here is this: invisible free, magical paid. Free AI disappears into the thing people already are doing. Paid AI creates something genuinely new. Something that wouldn't exist without the data the user already generated (and is now better because the free AI enriched the data). That's the line in the sand.

The real AI investment—the expensive, complex, actually-worth-paying-for stuff—lives in premium.

Can I get AI to upsell based on a feeling?

If you've captured 7 or more moments in a month, on the 1st of the following month, we generate a Chapter for you. It's a little monthly recap with your moments assembled into a unique narrative with your photos collaged in. The feeling I want it to make is like a magazine feature about your month, written in your words, stitched together into something you can share or look back at.

I think AI experiences need to be built around feelings. What do you want the person to feel with some output the AI gives them. That genuinely changes how you think about the UX, chat patterns, artifact format, etc.

Free users get one Chapter—one experience of the feeling I hope Chapters delivers. And I hope that is a upgrade point to get users to go to Premium…using the feeling as a upsell trigger.

It costs more money that Dig Deeper, because it’s a batch AI job running across all your moments, pulling photos, structuring a narrative, generating in my designed template. But like I said the goal isn't utilitarian. It's emotional. I want the first time someone opens their Chapter to feel like getting a letter from their past self. Something you'd send to a friend or your parents. "Look at my month guys."

It can be shared and I made the sharing UX fun. Will that help growth? Maybe a little. But that's secondary. The primary job is making someone feel something real about their own life and wanting to keep that going.

The free AI feature feeds this. Better moments = better Chapters, which = higher degrees of the feeling that can trigger the upgrade.

Again, mirror mirror on the wall…who’s maybe being the wrongest of them all.

don’t be mean PostHog

The more ambitious and risky AI feature

I’m also hoping the free AI helps feed better data into the premium AI feature I’m most curious/excited/unsure about.

Of the three layers it’s the most technically complex one.

Here's the problem I want it to solve: you capture moments and memories over weeks and months. Individual moments are great. But the real magic is in the connections between them—patterns you don't notice, themes running through your life, recurring feelings tied to certain people or places.

No human is going to sit and read back through 200 moments and spot those patterns or even just reread them. Most journals are static archives we forget and cupboard.

But an LLM is genuinely great at this kind of work. The work of finding connections across unstructured text and surfacing observations that are interesting without being obvious. Of making a graph of your memories.

Threads is having Ellie (our character) sitting across from you at coffee, having read your journal, and saying: "Hey, have you noticed that every time you talk about mornings on the balcony, you describe this sense of calm that doesn't show up anywhere else in your entries?"

That's an interesting Thread. That makes you think.

What's not interesting: "You mentioned coffee three times."

How I built it

This is where I went deep with AI-assisted architecture. And the process is worth sharing because if you're a non-technical builder touching anything with AI, the way you prompt your tools matters more than which tools you pick.

I split the work: OpenAI handles text embedding (turning text into searchable vectors so the system can find related moments across your history) and Anthropic handles reasoning (deciding what connections mean and what's worth surfacing). I didn't plan that split…I don’t know nearly enough to come up with an idea like that… it came out of a planning process where I forced the agents to push back on me and each other.

When I started building Threads, I didn't just say "build me a RAG pipeline" (that's retrieval-augmented generation: a system that searches your past data to inform new answers). I went into plan mode in Cursor and also built a parallel plan in Claude so they could cross-reference and PR review each other.

The key was putting the AI in the role of a technical person pushing back on my decisions. Not agreeing with me. Challenging me. Really being a technical sparring partner to a product person.

I asked things like:

  • How much would this cost me at scale?

  • How would this work with 500 entries?

  • How long will these runs take to do?

  • What happens with five years of history?

  • How do we connect dots from three years ago and not build something that only works in small batches?

  • What are 4 different approaches we could take to build this with pros and cons of each?

  • How do we do this so privacy is factored in?

That's just a PM doing PM things but with AI as the engineering counterpart. You define business requirements and experience goals, and force the AI to justify its architecture. Challenge the reasoning. Ask about edge cases.

Make it defend why Option B is better than Option A.

Make it defend it’s decision again.

Because if you don't do this, you will build something that works beautifully for 20 entries and falls apart at 200. AI is getting better at making good architecture decisions, but the exercise of forcing it to reason through options, and explain them to you (don’t skip reading the planning outputs it makes), is a really good practice.

I need Evals …. Good Threads vs. Bad Threads

I don't have any AI evals for Threads yet. But I know what good and bad feel like, and that's actually where this starts.

A good Thread: "When you talk about mornings outside on the balcony, walking to coffee, sitting in the garden—there's this consistent thread of optimism that doesn't show up in your indoor moments. You describe light, space, possibility. Have you noticed that?"

Constructive and interesting and it makes someone think.

A bad Thread: "You often have coffee on a balcony." Thanks Detective. Boring pattern to call out.

And then the worst kind…

A harmful Thread: "You seem really sad when you talk about this person." That's not our job. Threads should be constructive, thoughtful, and interesting. In very light ways they can be challenging like "you talk about wanting to write more but your moments don't mention it, what's holding you back?". Never critical. Never diagnosing. Never making someone feel bad about their own memories.

That’s not a spot we want to accidentally fall into. And the only way you avoid sliding somewhere bad or boring is with evals

Evaluations tell us if we’re doing a good AI job or not

You can't just ship an AI feature and hope it's good. You can, but you can’t.

A button either works or it's a bug. AI features are non-deterministic. Same input, different outputs, "good" is subjective.

Evals are basically unit tests for your AI. You define what good looks like, what bad looks like, run examples through, and measure how often the output lands where you want.

For Threads, I'm going to build what's called a model-as-judge eval. The idea is to use a separate AI to grade the output of the feature's AI.

e.g "Here's a Thread. Here's the moments it was based on. Is it interesting? Constructive? Specific enough to feel personal? Does it avoid being harmful?"

The AI judge scores it. You build a dataset. When you change a prompt, you run the eval again to see if things got better or worse.

It’s having Claude review and give feedback to OpenAI’s work.

I haven't built this yet but plan to in the coming few weeks. I was avoidant of doing this part because it isn’t the fun and shiny work, and I felt it needed more technical focus.

But that’s not true, you don't need to be technical to start. At its simplest I’ve been learning, an eval is a spreadsheet.

  • Column A: the input.

  • Column B: what the AI produced.

  • Column C: what good looks like.

  • Column D: your score.

50 examples will teach you more about your feature's quality than any amount of prompt tweaking.

The fancy model-as-judge setup just automates Column D. But the thinking starts on paper with YOUR judgement about good vs bad.

You can’t outsource that taste to AI. You can, but you can’t.

Not next, but soon

Evals are not the next most important thing for me to focus on because nobody is getting Threads yet. I need more users generating more moments so the connections get richer. Later I can do evals and figure out how often should Threads surface. Too much and it's noise. Too little and it's forgotten.

The next thing is getting this damn app store approval so it’s easy for everyone to try out—hoping next time I can share the website and app link.

we can work together…

Have an idea for a product?
I'll bring you my product experience, help you shape the right thing, and design & build it for you in 12 days

seriously, learn more here and shoot me an email (or reply to this email)

———————————————————————————

+ catchup on past entries

Keep Reading