The Risks of Black-Box Programming with AI

In QA and testing, it is common to talk about black-box testing or white-box testing. Black-box testing is conducted when you do not know the internal workings of the system and you test its functionality. White-box testing is performed when you know the internal workings or structure of the system, and the tests are configured to test the different possible cases that the code covers.

With artificial intelligence tools, what we could call black-box programming is emerging. The user specifies to the AI what they want functionally from an application, and the AI generates it.
The user does not know how their application works internally, they are not interested, or they are unable to understand the implications of how the solution has been implemented.
This would be a typical use case for tools like Lovable and several others that are beginning to emerge, but accepting AI-generated code without reviewing and understanding it could also be considered black-box programming.

In contrast to black-box programming, there is also white-box programming with AI. In this case, there is a technical lead who decides on the architecture and how to implement the solution, and uses the AI to implement it faster. Depending on the context and the capacity for understanding, the pieces delegated to the AI are larger or smaller.
The product developer changes their role; instead of writing lines of code, they design/lead, orchestrate, and review, being completely responsible for the "how." Therefore, my recommendation is that the fragments requested from the AI should be small and concise enough to be reviewed, understood, and to assume their subsequent maintenance (my recommendation is nothing larger than a pull request that we are going to review and understand). Although complete knowledge is ungraspable, and there are various shades of white, white-box programming consists of understanding the solution as completely as possible; it is not about dividing a large black box into a multitude of smaller black boxes.

Although I obviously have my preference, it is not that one way is good or bad per se, you simply have to understand them, see their pros and cons, and above all understand their risks.

In the case of black-box programming, the risks that in my opinion should be assessed would be:

  • Risk 1: Lack of accountability. As I mentioned in a previous article, AI cannot lead to a dilution of responsibility. If the generated software does not work correctly, has performance, security, or stability problems, or unpredictable side effects, the responsible party must be clear. Not understanding either the how or the why of those problems makes assuming responsibility more complicated.

  • Risk 2: Quality out of control. Not knowing how the software was generated leaves us unaware of whether a quality product has been generated or not. This is nothing new; most stakeholders just want the software to work, regardless of whether it was built with quality. But those who work with and know software architecture know that if quality is lacking, if there is technical debt, you end up paying for it: Difficulty of maintenance, errors, slowness, complication, and ultimately, needing to rebuild all the software from scratch.

  • Risk 3: Lack of knowledge of software architecture: Being unaware of the software architecture implies not knowing its strengths and weaknesses, not knowing which part will need to be reinforced or even how it can or should evolve. Being unaware of what its dependencies and potential vulnerabilities are.

  • Risk 4: Performance: The performance of an application is intimately related to its architecture. We can talk about scalability problems or "simply" an inefficient use of resources, but without understanding how an application is made, it is impossible—not just for us to do it, but even to propose optimizations.

  • Risk 5: Implicit knowledge: When we define and implement, there is a context and an implicit knowledge from each of the actors that adds value. For example, a software architect or developer will bring knowledge about architecture, design patterns, data security, and also functional matters based on previous experiences and implementations. Being implicit, much of this knowledge is neither transferred nor required to be applied to the AI, resulting in a lower quality solution than if several actors with diverse knowledge had intervened in its generation.

  • Risk 6: Evolution vs revolution (refactor vs replatform): We must keep in mind that software development is an iterative process. You don't just build an application and that's it. It has to evolve, be fixed, improved, etc. By not knowing how it was built, you cannot specify the evolution of certain aspects; instead, new high-level specifications are added, meaning there is a very high risk that the AI will rebuild the existing code to meet the new specifications, failing to meet previous ones, making mistakes again, and in general, bringing about all the problems that arise when rebuilding software instead of refactoring it.

  • Risk 7: Lost opportunities due to lack of knowledge: There are cases where knowing how it is made and understanding the technological capabilities generates business opportunities, new functional options, etc. It is the so-called "bottom-up" product development. By bypassing this knowledge and these actors who know how it is made, this creative capacity is lost.

Knowing these risks, one must be careful with the implications of black-box programming. Nevertheless, as we mentioned before, it is not that it is a bad approach; there are cases where this type of programming can be interesting, as it implies less consumption of human resources (I won't get into token consumption here).

For example, for proofs of concept, pilots, or prototypes, it can be interesting, since these are cases where performance and production stability are not as necessary as speed and the ability to test concepts, in addition to not intending to generate software that remains, improves, and evolves.

In summary, when we program with AI, just as if we outsourced software to a consultancy firm, we must consider the implications of not knowing said code, the risks it entails, and how we are going to maintain it. We have to consider whether we are commissioning the software for the solution to a problem or if we want to implement it ourselves.