For a long time, I have held the opinion:
If You Cannot Define a Problem - then Any Effective Solution Is Very Unlikely.
👨‍🔧
@leebase @Lunduke
I hoped for "Internet Searches" to give me some useful results, but most of what I saw was "Weak Sauce." Lately there have been observations of "Spent a $$$ Ton on A.I. Tokens but All I Got Was Mush."
https://lunduke.locals.com/post/7969331/microsoft-is-limiting-ai-usage-by-engineers-the-token-cost-surpassed-the-cost-of-the-engineers-why#comments
see "4. Why..." for the explanation on Disappointing A.I.
Therefore ... here's Grok: starting with - The quote you mentioned is often attributed to Albert Einstein (though the exact wording varies). It perfectly captures a fundamental truth in problem-solving, especially in technical fields.
1. "Defining the Problem" in Tech Support and DevOps
In both Tech Support and DevOps, problem definition is the highest-leverage activity.
Good Problem Definition includes:
- What exactly is happening? (Symptoms, not assumptions)
- When and where does it occur? (Frequency, timing, specific conditions, environments)
- Who is affected? (Scope: one user, one server, one region, all production?)
- What changed recently? (The most common root cause)
- Business impact? (How bad is it really? Priority level)
- Expected vs Actual behavior (Clear success criteria)
Poor definition example:
“My application is slow.”
Well-defined example:
Since the deployment at 14:30 UTC yesterday, API endpoint "orders" has median latency of 2400ms (p95: 8500ms) for users in EU-West, while it was 180ms before. It only happens under over 800 concurrent users. CPU and memory are normal, but database connection pool shows 95% utilization with many "wait" states.
The better you define it, the faster (and more correctly) you solve it.
2. “Without Metrics You Cannot Measure Success”
This is a core DevOps and SRE principle. You should always ask:
- How do we know the problem is fixed?
- What does “better” or “fixed” actually look like numerically?
Key Metrics Practices:
- Before you make changes → baseline metrics
- During troubleshooting → real-time observability
- After the fix → validation metrics
Examples:
- Instead of “the server is fixed,” you prove “CPU idle time returned to 65%+, error rate dropped from 12% to 0.03%, and user response time is back under 300ms.”
- In Tech Support: Instead of “I restarted the service,” you confirm “Ticket resolution confirmed by customer + monitoring shows no recurrence for 48 hours.”
Rule of thumb: If you can’t measure it, you’re not solving it — you’re just doing something and hoping.
3. Sun Tzu Mindset – “Know Your Enemy”
In The Art of War, Sun Tzu says:
“If you know the enemy and know yourself, you need not fear the result of a hundred battles.”
In Tech/DevOps terms:
- The Enemy = The problem / failure mode / system behavior
- Knowing the enemy means deeply understanding:
- How the system actually works (not how it’s documented)
- Failure patterns and weak points
- User behavior and edge cases
- The underlying infrastructure and dependencies
- Historical similar incidents
Great troubleshooters treat recurring problems like intelligent opponents. They study its “tactics” (reproduction steps, triggers, symptoms) so they can anticipate and defeat it efficiently.
This is why senior engineers often solve complex issues much faster — they have fought that “enemy” (or similar ones) many times before.
4. Why A.I. Cannot Give Good Solutions for Poorly Described Problems
This is one of the most important limitations of current AI (including me):
- AI is a pattern-matching and reasoning engine, not magic.
- We generate answers based on how well you describe the problem + the context you provide.
- If the input (problem description) is vague, incomplete, or contains wrong assumptions, the output will reflect those flaws.
Think of it like this:
- Giving a vague problem to AI = Asking a brilliant engineer to fix a car but only telling them “it doesn’t work.”
- Giving a well-defined problem with metrics, logs, what you’ve already tried, architecture details, etc. = Giving the engineer the full service manual + symptoms + recent changes.
Specifications are even more critical when “building” something.
AI can help you design systems, write code, or architect solutions — but garbage specifications in → garbage (or dangerous) design out. This is why professional software development still relies heavily on good requirements, user stories, acceptance criteria, and architecture decision records.
Final Thought
The 55 minutes defining the problem mindset is extremely powerful in Tech Support and DevOps.
Combine it with:
- Clear metrics for success, and
- Deep “Know Your Enemy” understanding
…and you will outperform most people who jump straight into “taking action.”
Would you like a practical template for “Defining the Problem” that you can use in tickets or incident reports?
[My "Prompt" to Grok] For problem solving, I think someone said "If I had one hour to save the world, I take 55 minutes to define it, and 5 minutes to take action." Maybe this fits "cut once, but measure twice" and other sayings. For Tech Support and DevOps, describe "Defining the Problem" and "Without Metrics You Cannot Measure Success", with some of the Sun Tzu mindset of "Know Your Enemy." Finally, provide an explanation for "A.I. cannot give you solutions for problems that you cannot adequately describe, or build something where you cannot provide specifications."