What we learnt building Opal
If you didn’t know, the world is changing, particularly software development.
Quality in software is more important than ever.
Affording leverage to those that have it.
Two software systems built through agentic workflows will likely compete at the boundary between human and model intelligence. A shrinking window?
Placing significant premium on security, data quality, extensibility and innovation.
Straining traditional distribution mechanisms (i.e. digital advertising).
Here is the core agentic development loop we used while building Opal;
Step 1. Prompt engineering and Planning: 1-2 hours (notion management inclusive)
Step 2. Functional development: 1-2 hours (agentic session execution)
Step 3. Refinement and Evaluation: > 3 days. Review | Consolidation | Testing | Fixing Bugs
& Repeat ^
Any safety critical (or marginally safety critical system) relies heavily on quality assurance. Because humans need to evaluate, understand and manage output end-to-end. Evaluation is proportional to code velocity… which does not have a ceiling in agentic coding.
Opal is lightly safety critical. Handling both business and consumer information. Having product commentary that needs to have integrity. Particularly for food products.
Given this, and that agentic coding has error.
Extended review periods are critical. Similar to human development.
This means most development gains come from cadence and throughput in step 2.
Step 3 involves both human and AI assisted iterative improvement cycles (asking agentic models to consistently improve themselves).
While implementing this process over ten weeks, we realized the following:
1. Planning significantly influences development outcome across long and short task horizons.
2. Models often cannot determine when not doing work is a better outcome than working. Arguably necessitating human intervention.
3. Context delivery, accessibility and timing has incremental exponential effects on development outcome.
We already knew that planning significantly effects development outcome, but this was even more pronounced during the development cycle. We learnt that on long horizon tasks poor planning exponentially increments error.
We used notion to implement daily repository level review cycles. I.e. Frontend, Backend, security focus.
We found the model would consistently highlight similar issues over time. This is positive developmental behavior and true in human environments. However, this creates a non-trivial limitation. If the model is more intelligent than the reviewer (already true or worst case likely true in the future). The reviewer cannot estimate the value of realizing the suggestion. So, the bias causes the reviewer to overvalue recommendations, significantly wasting time. This is exaggerated by low level model context delivery and visibility (step 3).
If a model has proper context, at the right time, it more effectively makes judgements. Referring to something like logging agents accessibility, architectural agents etc. Where the model derives the context it needs dynamically, or the planner delivers it more accurately.
Essentially variable context delivery. Probably a useful focal point in 2026.
————————————————————————————————————————————————————-
That summarizes our main takeaways from our last development cycle.
You can check out Opal here.
Does your business have a unique product story? Use Opal to tell people about it.
Maybe you have an innovative idea, we can build it for you!
Wondering how to onboard AI workflows for your business in 2026? Get in touch!
Thanks,
株式会社TiviTi