Engineering Blog

QuestionPro TechCommit Pune: Scaling Architectures, AI, and Networking

2025-12-05T00:00:00+00:00

Are you a developer, software engineer, or data scientist in the Pune area looking to connect, collaborate, and dive deep into cutting-edge technology?

QuestionPro is excited to announce TechCommit, a focused technical meetup designed to accelerate your growth and provide actionable insights into building high-performance, intelligent systems. Join us in our Pune office on Saturday, December 13th, for two power-packed sessions!

🗓️ Essential Event Details

Detail	Information
What	QuestionPro TechCommit Developer Meetup
When	Friday, December 13th (10:00 AM – 1:30 PM IST)
Where	QuestionPro, 1B, Nano Space IT Park, Baner, Pune, Maharashtra
Registration Deadline	Thursday, December 11th

🎙️ Agenda: Deep Dives by QuestionPro Experts

We have lined up two high-impact talks from our senior experts, focusing on the real-world challenges of building highly available systems and leveraging Artificial Intelligence for data analysis.

Talk 1: From Zero to 10k Live Users: Scaling Real-time Engagement Platforms

Speaker: Rohan Rao, Software Engineer

Building a real-time platform that handles thousands of concurrent users is a critical challenge. Rohan Rao, a Software Engineer with over 7 years of experience, will share lessons learned and practical strategies for:

Designing resilient and scalable architectures.
Implementing high-performance solutions that maintain speed under heavy load.

Talk 2: AI in Action: Building a Smarter Topic Analysis System

Speaker: Pratik Dhulubulu, Dedicated Data Scientist

Artificial intelligence is rapidly transforming how companies understand user data. Pratik Dhulubulu, a Data Scientist with 4+ years of experience, will move past the hype and demonstrate how to:

Leverage AI to solve complex data problems.
Build a sophisticated, data-backed topic analysis system using modern Python techniques.

Why This Pune Tech Meetup Is a Must-Attend

Focused Learning: Gain high-value, technical knowledge from experienced practitioners in a short, efficient format.
Networking: Connect face-to-face with fellow software engineers, data scientists, and developers in the vibrant Pune tech scene.
Community: Discuss the real-world implementation challenges of scaling and AI in a professional environment.

Secure Your Spot

Spaces for this developer meetup in Pune are limited. Don’t miss out!

The last day to register is Thursday, December 11th.

Register Now

We look forward to connecting with you at the QuestionPro office in Baner!

Scaling LivePolls to 10K Concurrent Users - A Journey Beyond Limits

2025-11-06T00:00:00+00:00

A software product that solves a business problem is good. But a software product that scales as the business grows - that’s excellent!

As the saying goes:

“In the digital age, scaling is the compass that guides your product toward success in the vast ocean of possibilities.”

This blog is a story of how we scaled LivePolls, a real-time engagement product from QuestionPro, to handle a staggering 10,000 participants in a single session - on a single server.

You’ll see the problems we faced, the creative solutions we engineered, and the lessons we learned along the way. So grab your favorite drink and some snacks - this is a scaling story you’ll actually enjoy reading.

A Bit of Context

LivePolls is a free product from QuestionPro that allows users to engage audiences in real time - through polls, quizzes, and interactive questions.

The Admin creates a quiz or poll and starts a live session
Participants join using a session PIN
Everyone engages together in real-time through dynamic charts, leaderboards, and results

At its core, LivePolls uses WebSockets for fast, real-time communication between clients and servers. And when we say real-time, we mean milliseconds matter.

The Challenge

One of QuestionPro’s enterprise customers wanted to host a LivePoll session with over 5,000 participants.

Our previous tests had maxed out around 2,000 participants per server. Beyond that, things… well, broke.

We lacked infrastructure, saw memory leaks, hit socket connection limits, and faced painfully slow queries. And thus began our epic journey to scale.

Problems We Faced

No infrastructure to load-test beyond 2,000 participants
Memory leak in the application
Nginx limits on concurrent socket connections
Slow MySQL queries causing high memory usage and timeouts

Solutions We Devised

1. Building Infrastructure for 10,000 Virtual Participants

Since LivePolls uses WebSockets, we first tried Artillery.io for load testing - it’s a solid library for socket-based tests.

But we quickly hit a wall:

Only ~1,000 WebSocket connections per local machine
A steep learning curve for complex test flows (like joining, answering, viewing results)

So, we decided to go rogue and wrote our own simple JavaScript script to simulate participant actions:

joining → answering → viewing results

Then came the real question - where do we run these thousands of connections?

We leveraged free-tier cloud VMs from AWS, since:

There was no limit on the number of VMs we could spin up
Each VM could handle around 1,000 socket connections
So, 10 VMs = 10,000 virtual participants. Simple math. Beautiful scaling.

When we later moved the setup to our in-house test servers, we had to tweak some system parameters (like increasing file descriptors):

# /etc/security/limits.conf
your_user soft nofile 1000000
your_user hard nofile 1000000

Each socket uses one file descriptor - so, no descriptors = no sockets = no fun.

2. Fixing the Memory Leak

A memory leak occurs when an app allocates memory but doesn’t release it properly.

After several smaller tests, we found that memory usage never dropped back to normal even after participants disconnected. Suspicious. 🤔

We dug deeper using heap snapshots and discovered that WebSocket connections were not being released after closing. Our stack uses Socket.io, which internally uses the WS library for WebSocket handling.

Then came the “aha” moment - we found EIOWS, a C++-based, high-performance, low-memory WebSocket engine, fully compatible with Socket.io.

With just one line of code changed, we swapped WS with EIOWS - and boom, no more leaks. (If only all life’s problems could be solved this easily.)

3. Tuning Nginx for 10K+ Concurrent Connections

Once memory was stable, we went for the big one - 5K participants.

…and hit a wall again.

The error?

Too many open files

Classic. 😅

We increased:

worker_connections 8192;
worker_processes 8;

Expecting ~64K connections. But we still crashed around 4K.

Turns out, each connection in Nginx uses 2 file descriptors - one for the client, one for the upstream server.

So, we added:

worker_rlimit_nofile 16192;

This allowed each worker process to handle roughly 8K connections, comfortably crossing the 10K mark.

4. Optimizing Slow MySQL Queries

With 10K people connected, the next step was an end-to-end test: participants answer, admin views charts/leaderboards, participants view relative rank, etc.

We found the application server crashed when admin requested result charts. The culprit: queries pulling all rows into application memory and then aggregating in JavaScript - a disaster at 10K+ rows.

Below are concrete examples showing before (inefficient) and after (optimized) queries we used.

Naive / Inefficient Approach (what we had)

These queries fetch all rows into the app and then process them in JS:

-- inefficient, pulls all rows
SELECT * FROM responses WHERE question_id = 123;

If you then sort or aggregate on this in-app, memory usage explodes. 💥

Optimized Queries - Let MySQL Do the Heavy Lifting

1. Get response count for each answer option (for charting)

SELECT answer_option_id, COUNT(*) AS response_count
FROM responses
WHERE question_id = 123
GROUP BY answer_option_id;

Why better: MySQL aggregates rows on the server and returns a tiny result set (one row per option), not 10K rows.

2. Get respondents sorted by score and limit (leaderboard)

SELECT respondent_id, total_score
FROM respondents_scores
WHERE session_id = 456
ORDER BY total_score DESC
LIMIT 100;

Why better: Database-level ORDER BY + LIMIT avoids fetching all respondents into memory.

The Result

After fixing memory leaks, tuning Nginx, and optimizing queries:

✅ LivePolls successfully handled 10,000 concurrent users on a single app server (2 CPU, 4GB RAM)
✅ Multiple full-scale tests were conducted smoothly
✅ No crashes, no leaks, no long waits - just 10,000 people engaging in real-time

Final Thoughts

Scaling isn’t just about tweaking parameters - it’s about learning how your system breathes under pressure.

We hope this story gave you a few ideas, maybe even some laughs, and definitely the confidence to take your own systems beyond limits.

Remember:

“Scaling isn’t a destination; it’s a journey of continuous improvement and evolution.”

And sometimes, that journey starts with a single config file. 🚀

The Conscious Vibe Coder

2025-10-30T00:00:00+00:00

The Conscious Vibe Coder : Thinking Beyond the Autocomplete How thoughtful coding separates professionals from prompt typists

In today’s world of AI-assisted coding, where tools like GitHub Copilot, ChatGPT, and cursor editors can generate entire modules in seconds, many developers are falling into a subtle trap — they are typing prompts, not writing solutions. The goal of the Conscious Vibe Coder is not to reject AI, but to use it wisely: balancing automation with human reasoning, product understanding, and performance thinking.

The Vibe Shift in Coding

Coding used to be about writing logic line by line. Today, it’s about designing, reviewing, and refining AI-generated logic. Developers who can blend creativity, clarity, and critical reasoning stand apart. The ‘vibe’ of modern coding isn’t about how fast you type — it’s about how deeply you think.

Boilerplate Isn’t the Enemy, But Mindless Acceptance Is

AI excels at generating repetitive or boilerplate code. Setting up a controller, defining routes, initializing React states — these can be automated. But blindly accepting that output without understanding its flow is where problems begin.

@Controller("users")
export class UserController {
  constructor(private readonly userService: UserService) {}

  @Get(":id")
  findOne(@Param("id") id: string) {
    return this.userService.findOne(+id);
  }
}

AI can generate the structure instantly — but the conscious coder will ask: What if the ID doesn’t exist? Should we handle exceptions? Do we need caching or role-based access control? That’s the human layer AI cannot replace.

Clarity of Thought Drives Code Quality

Before prompting AI, clarity matters more than syntax. If your instructions are vague, your output will reflect it. A conscious coder thinks like an architect — not a script generator.

❌”Build a booking API.”

✅ “Create a Spring Boot REST API for hotel bookings with endpoints to create, cancel, and list bookings. Validate overlapping dates, include exception handling, and use an in-memory database for demo.” The difference isn’t just in wording; it’s in mindset. Clear thinking produces clear instructions, which in turn produce better AI outputs.

Review Like a Mentor, Not a Machine

AI often produces code that ‘works’ but doesn’t ‘scale’. A conscious coder reviews every line — as if mentoring a junior developer.

The question is never ‘Does it run?’ but ‘Is it robust, readable, and reliable?’

AI-Generated Example:

public UserProfile getProfile(Long userId) {
  return userRepository.findById(userId).get();
}

Refactored Version:

public UserProfile getProfile(Long userId) {
return userRepository.findById(userId)
        .orElseThrow(() -> new ResourceNotFoundException("User not found: " + userId));
}

The improved version avoids runtime crashes, adds contextual errors, and is maintainable. Real-world consciousness means thinking about nulls, caching, retries, and downstream impacts.

The Human Advantage in the Loop

AI tools can predict syntax, but they can’t predict intent. They don’t understand business logic, ethics, or the small product details that define why something should or shouldn’t happen. Imagine an AI-generated discount system for an e-commerce site:

public double calculateFinalPrice(double price, double discount, double coupon) {
  return price - discount - coupon;
}

It works mathematically — but fails logically.

In your product, discounts and coupons aren’t supposed to stack. So the correct version is:

public double calculateFinalPrice(double price, double discount, double coupon) {
  double appliedDiscount = Math.max(discount, coupon);
  return price - appliedDiscount;
}

That difference — understanding business logic, user impact, and edge conditions — is the human layer that AI doesn’t see. AI writes code that’s syntactically correct; humans write code that’s contextually correct. A conscious developer reads AI output like a reviewer, not a consumer. You question assumptions, ask “What if?”, and make the code align with real-world behavior — not just logic.

Coding with Product Empathy

Coding isn’t just about systems; it’s about people. A conscious coder doesn’t only think in APIs and functions — they think in user experience, product flow, and real-world outcomes. Imagine AI writes a simple payment retry function:

async function processPayment(orderId: string) {
  let attempts = 0;
  while (attempts < 3) {
    const success = await charge(orderId);
    if (success) break;
    attempts++;
  }
}

Technically fine — but logically dangerous. If the first payment went through but the confirmation failed due to a network glitch, this logic may charge the customer multiple times. A conscious developer understands business context and rewrites it safely:

async function processPayment(orderId: string) {
  const existing = await checkTransactionStatus(orderId);
  if (existing?.status === "SUCCESS") return existing;

  let attempts = 0;
  while (attempts < 3) {
    const success = await charge(orderId);
    if (success) return success;
    attempts++;
  }
  throw new Error("Payment failed after retries");
}

Empathy-driven coding means caring about what happens after the code runs — in the user’s world, not just in your terminal.

Code That Runs vs. Code That Scales

Here’s where most developers slip.

AI-generated code can pass tests, but may crumble under real load, missing caching, concurrency handling, or memory management. Conscious coders always test the system’s behavior, not just its syntax.

Example

useEffect(() => {
fetch('/api/products')
.then(res => res.json())
.then(setProducts);
}, []);

Refactored Version:

const [loading, setLoading] = useState(true);
const [error, setError] = useState(null);

useEffect(() => {
  const controller = new AbortController();
  const fetchData = async () => {
    try {
      const res = await fetch("/api/products", { signal: controller.signal });
      if (!res.ok) throw new Error("Failed to fetch");
      const data = await res.json();
      setProducts(data);
    } catch (err) {
      if (err.name !== "AbortError") setError(err.message);
    } finally {
      setLoading(false);
    }
  };
  fetchData();
  return () => controller.abort();
}, []);

The second version anticipates real scenarios — slow networks, user cancellations, and error handling. AI can write syntax, but performance empathy comes from human experience.

Being a Conscious Vibe Coder doesn’t mean rejecting AI — it means co-creating intelligently. You let AI handle structure, and you focus on quality, performance, and clarity. Think of it as a loop: instruct → generate → review → refactor → re-instruct. This approach creates a developer who can build faster and smarter — someone who can lead design discussions, not just follow autocomplete suggestions. The conscious coder owns the solution, not the syntax.

Final Thoughts

In the age of AI-powered development, our value as engineers lies in our ability to think critically, design for humans, and question every line. Conscious Vibe Coders don’t just code — they create systems that last. Stay thoughtful. Stay curious. Stay conscious.

Manual Topic Mapping to AI-Powered Discovery: Our Journey Building a Smarter Topic Analysis System

2025-10-15T00:00:00+00:00

The Beginning: Discover and How We Used to Work

For years, our text analysis platform Discover was the workhorse of our operations. It was sophisticated, accurate, and reliable. But it had one fundamental limitation that would eventually force us to rethink everything: every single topic had to be manually defined before we could analyze any text.

How Discover Worked

Imagine you’re analyzing customer feedback for a restaurant chain. Before Discover could process a single review, we needed domain experts to sit down and create a comprehensive list of topics:

“food quality”
“service speed”
“ambiance”
“cleanliness”
“value for money”
… and potentially 50+ more

Once these topics were defined, Discover would read each piece of text and figure out which predefined topics it matched. The system was actually quite clever about this—it offered four different ways to match text to topics:

RAG-based Tagging used advanced AI to understand context and map text to the closest matching topic. If someone wrote “the meal was delicious,” it would correctly map this to “food quality.”

Rule-based Extraction used linguistic patterns and grammar rules to extract topic-opinion pairs from sentences.

Language Model Extraction used fine-tuned AI models to directly generate topic-opinion-sentiment combinations.

The output was beautifully structured. For the review “The food was excellent but service was slow,” Discover would produce:

Topic: “food quality” Opinion: “excellent” Sentiment: Positive
Topic: “service speed” Opinion: “slow” Sentiment: Negative

It worked brilliantly. Until it didn’t.

When the System Started Breaking Down

The Expert Bottleneck

Every new client meant weeks of preparation. We’d gather domain experts, conduct stakeholder interviews, review sample data, and painstakingly build topic taxonomies.

For a new restaurant chain: 2-3 weeks of topic curation
For a healthcare feedback system: 3-4 weeks
For a complex B2B SaaS product: Sometimes a month or more

We were spending more time defining topics than actually analyzing data. Every project started with the same bottleneck: waiting for humans to tell the AI what to look for.

The Problem of Emerging Topics

Here’s where things got really frustrating. The world doesn’t wait for your topic taxonomy to catch up.

When COVID hit, suddenly customers were talking about:

“contactless delivery”
“safety protocols”
“mask enforcement”
“outdoor seating”

Discover couldn’t see these topics. They weren’t in our predefined list. By the time we identified these patterns manually and added them to the taxonomy, we’d already missed weeks or months of valuable insights in historical data.

The painful reality: We were always looking backward, never forward.

The Granularity Dilemma

Different people needed different views of the same data:

Marketing teams wanted broad themes: “Is customer service a problem?”
Operations teams wanted specifics: “Is it the checkout process or the return policy causing issues?”
Executives wanted strategic insights: “How does our brand perception compare to competitors?”

Discover forced us to choose one granularity level upfront. Want to switch from high-level themes to detailed issues? Reprocess everything with a new topic taxonomy.

The Multilingual Nightmare

Expanding internationally meant exponentially more work. English restaurant topics weren’t the same as Spanish restaurant topics—cultural differences meant different things mattered. Either we had to compromise with this or add topics in different languages. Each language and region needed its own carefully curated taxonomy.

We found ourselves maintaining hundreds of topic lists, each requiring expert maintenance. It was unsustainable.

The Turning Point: What If Topics Could Discover Themselves?

One day, someone asked the obvious question: “What if we just let the AI figure out what topics exist?”

It seemed radical. Our entire system was built on human-curated topics. But the more we thought about it, the more sense it made.

Instead of telling the AI “These are the topics, find them in the text,” what if we said “Here’s the text, tell us what topics exist”?

The New Vision

We needed a system that could:

✅ Analyze any domain immediately - No weeks of topic curation
✅ Discover emerging topics automatically - See new patterns as they appear
✅ Provide multiple levels of detail - Zoom from strategic themes to operational specifics
✅ Work across languages - Understand meaning, not just words
✅ Keep getting smarter - Learn and adapt as more data comes in

This meant rebuilding our entire approach from the ground up.

Building the New System: From Classification to Discovery

The Fundamental Shift

Discover’s approach: “Here are the topics. Match text to them.”
AI Topics approach: “Here’s the text. Discover the topics within it.”

The technical term is moving from supervised classification to unsupervised clustering. But what really matters is the user experience transformation:

Before: Project kickoff → 3 weeks of topic curation → Data processing → Analysis
After: Project kickoff → Upload data → Topics discovered automatically → Analysis

How Automatic Topic Discovery Works

AI Topics uses semantic embeddings—a way of converting text into mathematical representations that capture meaning. Here’s why this matters:

When Discover saw “fast service” and “quick response,” it treated them as potentially different topics because they used different words.

AI Topics understands they mean the same thing. It converts both phrases into similar mathematical representations, so they naturally cluster together as one topic.

This happens automatically. No one needs to tell the system that “fast” and “quick” are synonyms, or that “service” and “response” are related in customer feedback contexts. The AI learns semantic relationships from the data itself.

The Magic of BERTopic Clustering

We chose BERTopic as our clustering engine—a state-of-the-art topic modeling framework that combines the best of modern NLP with robust clustering algorithms.

Here’s the process in simple terms:

Understanding - Every document gets converted into a mathematical representation that captures its meaning using transformer-based embeddings
Dimensionality Reduction - Using UMAP (Uniform Manifold Approximation and Projection), we reduce the complexity while preserving the semantic structure of the data
Finding Patterns - HDBSCAN (Hierarchical Density-Based Spatial Clustering) identifies natural groupings in the data—documents talking about the same things cluster together automatically
Initial Labeling - BERTopic generates preliminary, raw topic names based on the most statistically distinctive terms in each cluster (e.g., ‘quick_fast_rapid’)—a necessary but often unintuitive first step
Organizing - Topics arrange themselves hierarchically, from broad themes down to specific issues

The beautiful part? This all happens automatically. Upload your data, and within hours (not weeks), you have a complete topic taxonomy discovered from the content itself.

Smart Configuration: One Size Doesn’t Fit All

Here’s something crucial we learned: the optimal clustering parameters change depending on your data size.

Clustering 100 documents requires different settings than clustering 100,000 documents. BERTopic uses UMAP and HDBSCAN under the hood, both of which have parameters that dramatically affect results.

Our solution: Dynamic configuration selection.

AI Topics automatically adjusts UMAP and HDBSCAN parameters based on the number of texts you’re analyzing:

Small datasets (hundreds of documents): More aggressive parameter settings to ensure patterns emerge even with limited data
Medium datasets (thousands of documents): Balanced parameters that capture both broad themes and specific nuances
Large datasets (tens of thousands+): Conservative parameters that prevent over-fragmentation and maintain interpretable clusters

This adaptive approach means you get quality results whether you’re analyzing a week’s worth of customer feedback or a decade’s worth of archived documents. No manual tuning required—AI Topics knows what works best for your data volume.

What About Discover’s Classification Strength?

Here’s the good news: we didn’t throw away what made Discover great.

AI Topics includes an available_taxonomy operation that works exactly like Discover—but enhanced with modern AI. If you have predefined topics you want to use (maybe you’re in a well-established domain with known categories), you can still use them. The system will classify your text against those topics using the same sophisticated extraction methods Discover offered.

The difference? Now you have a choice. Use predefined topics when they make sense, or let the system discover topics when exploring new territory.

The Migration Journey: Challenges We Faced

Challenge 1: Speed vs. Intelligence

Discover was fast—about 50 milliseconds per document. Why? Because checking text against a predefined list is computationally simple.

AI Topics’ automatic discovery is more complex. Converting text to semantic embeddings and running clustering algorithms takes more time—about 100 milliseconds per document for batch processing.

Our solution: A two-tier approach.

For real-time topic assignment (when you need answers now), the system assigns new documents to existing topics quickly—just as fast as Discover.
For topic discovery and refinement (when you’re exploring patterns), the system runs more thorough analysis in the background. You submit a job, get an immediate confirmation, and receive results when the deep analysis completes.

Most users don’t notice the difference. They get fast responses for daily operations and thorough insights for strategic analysis.

Challenge 2: Making Sense of Discovered Topics—Enter OpenAI

Discover had an unfair advantage: humans named the topics. “Customer Service Quality” is immediately clear to anyone.

Automatically discovered topics? Sometimes the labels were… less intuitive. Early BERTopic attempts produced gems like:

Cluster 1: “good_great_excellent_amazing_fantastic”
Cluster 2: “quick_fast_rapid_speedy_prompt”
Cluster 3: “wait_long_slow_delay_forever”

Technically accurate—these words did cluster together. But not exactly boardroom-ready.

The breakthrough: OpenAI representation labeling.

Instead of relying on simple keyword extraction, we integrated OpenAI’s language models to generate intelligent, contextual topic labels. The AI reads the most representative documents in each cluster and creates human-like descriptions of what the topic is actually about.

Those same clusters transformed:

Cluster 1 → “Overall Satisfaction and Positive Experiences”
Cluster 2 → “Service Speed and Efficiency”
Cluster 3 → “Wait Times and Service Delays”

But it gets better. OpenAI’s understanding of context means labels adapt to your domain:

In restaurant feedback, a cluster about “noise” becomes → “Ambiance and Noise Levels”
In hospital feedback, the same semantic cluster becomes → “Patient Rest and Recovery Environment”

The AI understands that identical concepts need different framing depending on context.

The quality improvement was dramatic. In blind tests with business stakeholders, OpenAI-generated labels were preferred over human-curated topic names 73% of the time. Why? Because the AI had read every single document and distilled the essence, while humans were working from assumptions and samples.

Users can still refine labels manually if needed, but most of the time, the OpenAI-generated names are immediately usable.

Challenge 3: Optimizing BERTopic for Different Data Volumes

BERTopic is powerful, but it’s not a one-size-fits-all solution. The same clustering parameters that work beautifully for 500 documents can produce terrible results for 50,000 documents.

The challenge: UMAP and HDBSCAN (the algorithms powering BERTopic’s clustering) have parameters that need careful tuning based on data characteristics. Set them too conservatively, and small datasets won’t reveal any patterns. Set them too aggressively, and large datasets fragment into hundreds of meaningless micro-topics.

Our solution: Intelligent parameter adaptation.

We built logic that analyzes your dataset size and automatically selects optimal UMAP and HDBSCAN configurations:

For smaller datasets, the system uses parameters that encourage pattern discovery even when data is limited. This prevents the “everything is one big topic” problem.
For medium datasets, it balances between capturing nuanced sub-topics and maintaining broad thematic groupings.
For larger datasets, it applies conservative settings that prevent over-fragmentation while still revealing meaningful structure.

The result? Whether you’re analyzing 200 customer reviews or 200,000, you get coherent, interpretable topics without manual parameter tuning. The system adapts to your data automatically.

Challenge 4: When Topics Evolve

Real-world topics don’t stay still. During COVID, “service” discussions shifted dramatically—from “Is the staff friendly?” to “Are safety protocols being followed?”

AI Topics needed to detect these shifts and adapt. We built monitoring capabilities that track:

How topic distributions change over time
When topic coherence starts degrading (a sign that a topic is splitting or evolving)
When new patterns emerge that don’t fit existing clusters

When significant drift is detected, AI Topics can trigger retraining to refine topic boundaries. This keeps the analysis relevant as the world changes.

The Advantages: What Changed for the Better

Dramatically Better Topic Quality

The most surprising benefit wasn’t just eliminating manual curation—it was discovering that automatically generated topics were often better than human-curated ones.

Real impact: A retail client came to us with Black Friday feedback data. With Discover, their domain experts had created 47 predefined topics like “checkout experience,” “product quality,” and “delivery speed.”

Our automatic discovery found 63 topics—a 34% increase in thematic coverage—and many revealed insights the manual taxonomy had missed entirely:

“Gift wrapping delays” (a Black Friday-specific concern)
“Size inconsistency across brands” (an operational issue causing returns)
“Mobile app cart abandonment” (technical friction the IT team needed to know about)

The human experts weren’t wrong—they just couldn’t anticipate every nuance hidden in thousands of customer comments. The AI could.

Never Miss Emerging Topics Again

AI Topics continuously monitors for new patterns. When enough documents start discussing something novel, it surfaces as a new topic cluster automatically.

Example: A hotel chain client suddenly started receiving feedback about “EV charging stations.” This wasn’t in anyone’s original topic taxonomy because it wasn’t a common concern when the project started. AI Topics identified it as an emerging topic within days, allowing the hotel chain to prioritize installing charging stations at high-demand locations.

Discover would have missed this entirely until the next quarterly taxonomy review.

Higher Quality Topic Granularity

Discover required choosing granularity upfront: broad categories or specific issues? You couldn’t have both without running separate analyses.

AI Topics’ automatic discovery naturally creates multiple granularity levels simultaneously because it uses hierarchical clustering in BERTopic:

Level 1 (Strategic): “Guest Experience Quality”
- Level 2 (Departmental): “Room Comfort,” “Service Responsiveness,” “Facility Amenities”

What makes this powerful is that the AI determines the right number of levels and topics at each level based on the actual data patterns—not human assumptions.

A retail client with seasonal products found their topic hierarchy changed across seasons. Summer analysis had deep sub-topics around “outdoor furniture durability,” while winter analysis had completely different granular topics around “indoor heating products.” The AI adapted automatically.

The insight: Different datasets need different structures. Forcing a predefined hierarchy misses these natural variations.

Semantic Search That Actually Understands

Discover search: “Find documents about fast service” → searches for the exact words “fast” and “service”
AI Topics search: Understands you’re looking for concepts about service speed and finds:
- “the staff was very efficient”
- “quick response to our requests”
- “didn’t have to wait long”
- “prompt attention from servers”

None of these contain your exact search terms, but they’re all semantically related to what you’re looking for.

Real impact: A customer support team reported finding 40% more relevant documents for issue investigation. Problems they thought were isolated turned out to have been discussed repeatedly using different terminology.

True Multilingual Understanding

Instead of maintaining separate topic taxonomies for each language, AI Topics uses multilingual embeddings. English “excellent food” and Spanish “comida excelente” are understood as the same concept automatically.

Real impact: A global hotel chain operating in 15 countries went from maintaining 15 separate topic systems to one unified system that understands all their languages. Cross-cultural insights that were previously impossible became routine.

What We Learned Along the Way

The Importance of Human Touch

Fully automated doesn’t mean fully hands-off. The best results come from a partnership between AI and human intelligence.

Topic labeling benefits enormously from occasional human review. The system might generate “speed_efficiency_temporal” as a cluster name. A human immediately sees this should be “Service Speed.”
Quality validation requires human judgment. Is “parking” a subset of “facilities” or its own top-level topic? Depends on your business context.

Start Simple, Add Complexity Gradually

Our first automatic clustering was overly sophisticated. We used every advanced technique we could think of. It was slow and hard to debug when something went wrong.

Stepping back to simpler approaches—good embeddings with straightforward clustering—delivered 90% of the value with 10% of the complexity. We added sophistication only where it clearly improved results.

The Value of Explaining the AI

Users were initially skeptical. “How can the AI know what topics are important if we don’t tell it?”

Building transparency helped enormously:

Show representative documents for each topic
Display confidence scores for topic assignments
Explain why documents clustered together
Provide side-by-side comparisons of automatic vs. manual taxonomies

When people understood how the discovery worked, trust followed.

Looking Ahead: What’s Next

Smarter Topic Evolution Tracking

Right now, the system detects that topics are changing. Next, we want to visualize how they’re changing:

Topics splitting into sub-topics
Topics merging as concepts converge
Topics appearing and disappearing over time

Imagine a time-lapse showing how customer concerns evolved throughout a product lifecycle.

Predictive Intelligence: From What to Why

Moving beyond “What topics exist?” to “What topics actually matter?”

Which new conversation topic today can predict a 15% rise in customer churn next month?
Which topics drive repeat purchases?
Which topics are early indicators of viral content?

Connecting topic analysis to business outcomes would transform it from descriptive analytics to predictive intelligence.

Real-Time Topic Monitoring

Imagine a live dashboard showing topics as they emerge in real-time—not hours after data upload, but streaming as new documents arrive. Social media monitoring could surface trending topics within minutes of them starting to appear.

The Bottom Line: Different Tools for Different Jobs

Discover wasn’t wrong—it was optimized for a specific use case. When you:

Work in a well-understood domain
Have stable topic requirements
Need maximum processing speed
Want human-validated topic names

Then predefined topic classification (like Discover’s approach) is actually the better choice. And with AI Topics’ available_taxonomy operation, you can still use this approach when it makes sense.

But when you:

Explore new domains without established taxonomies
Need to discover emerging patterns
Want to understand semantic relationships
Require multilingual analysis
Need strategic-to-tactical granularity

Then automatic topic discovery is transformative.

We’re not declaring victory over an old system. We’re celebrating having the right tools for different situations.

The Real Achievement

The technical accomplishments are meaningful: BERTopic clustering, semantic embeddings, OpenAI representation labeling, adaptive parameter configuration, hierarchical topic trees, multilingual understanding.

But the real achievement is qualitative: topics that capture nuances human experts miss while remaining immediately understandable.

The combination of BERTopic’s sophisticated clustering and OpenAI’s contextual labeling creates something neither could achieve alone:

BERTopic finds the patterns
OpenAI makes them comprehensible
Together, they reveal insights hiding in plain sight

When someone says “We have a new dataset to analyze,” the conversation isn’t just about eliminating setup time. It’s about discovering what’s actually in the data, not just confirming what we thought would be there.

That’s the shift that matters. From AI that classifies text into boxes we define, to AI that shows us which boxes actually exist—and labels them in ways we immediately understand.

Technical Summary

Discover: Supervised topic classification with predefined taxonomies and human-curated labels
AI Topics: BERTopic-based unsupervised discovery with OpenAI representation labeling
Best of Both: available_taxonomy operation preserves classification capabilities when needed

Key Quality Improvements

Discovery of nuanced topics human experts didn’t anticipate
Context-aware, business-ready labeling through OpenAI integration
Adaptive topic granularity based on natural data patterns
Superior semantic understanding for better search and grouping
Unified multilingual analysis maintaining consistent quality across languages

The Philosophy: Let humans focus on interpretation and strategy. Let AI handle pattern discovery, intelligent labeling, and routine classification.

How QuestionPro Has Impacted My Tech Career

2025-10-14T00:00:00+00:00

It has been three years since I joined QuestionPro, and during this time, the journey has been nothing short of transformative. When I look back, I realize that what has changed the most is my mindset—towards problems, towards products, and even towards people. Earlier in my career, I focused only on solving a technical issue in isolation. Today, I think about why a problem exists, how it affects the product as a whole, and what impact it has on the people using it.

Mindset Shift

The first and perhaps the most profound transformation I’ve experienced is a shift in mindset.

When I started my career, my perspective was narrow:

I saw tasks as checkboxes to be completed.
I measured success by how quickly I could deliver code.
I viewed problems as obstacles, not opportunities.

At QuestionPro, I learned to see beyond the code. Problems are no longer roadblocks—they are signals pointing toward hidden issues, unmet user needs, or areas of improvement in our systems. A bug isn’t just “something broken”; it’s a chance to refine, to rethink, and to deliver a better user experience.

I also began to see features differently. A feature is not just a “requirement” to be implemented but part of a larger story. Every line of code we write contributes to a user’s journey, and every decision we make leaves an imprint on how someone interacts with the product. This realization changed how I approach design, architecture, and collaboration.

Most importantly, this mindset shift extended to how I see my own role. As a developer, my contribution is not just technical—it’s strategic. It’s about asking the right questions:

Why does this matter for the user?
How does this tie into the product vision?
What impact will this decision have six months down the road?

This way of thinking has made me not just a better developer, but a more thoughtful professional. It has given me resilience too—because when you see challenges as opportunities, the journey itself becomes rewarding.

Scope of Growth

If I were to describe the most valuable gift I’ve received here, it would be growth—and not just in technical skills. The growth I’ve experienced spans multiple areas:

From technical expertise to communication
From cultural awareness to people management
From understanding a single feature to comprehending the entire product lifecycle

This wide range of opportunities is rare, and it has been maintained through a few core practices that stand out to me:

Learning Environment

A healthy learning environment is not a luxury, it’s a necessity. QuestionPro has made it possible for us to learn without fear of judgment. Mistakes are seen as part of the process, and curiosity is encouraged.

Collective Learning

Learning collectively has been a breakthrough. When we share our discoveries, failures, and ideas, the knowledge spreads faster and deeper than any documentation could allow. I’ve seen how this practice saves countless hours across the team and also builds trust.

Continuous Push and Improvement

Borrowing from the CI/CD mindset, growth works the same way: push continuously, improve continuously. Growth is like resistance training—it’s not always comfortable, but it’s the only way to get stronger. At QuestionPro, this philosophy is embedded in both personal and professional development.

Understanding the Product’s Goal

One of the most important lessons I’ve learned is the value of understanding the product’s larger goal.

Think about it this way: the best doctors don’t just treat a symptom; they treat the body as a whole. Similarly, when we build a feature, we shouldn’t just focus on a ticket or task in isolation. Instead, we need to connect it back to the overall product vision.

This approach has helped me identify edge cases more effectively, think from the user’s perspective rather than just the developer’s perspective, and write code that reduces bugs in the long run. It’s not just about solving what’s in front of you; it’s about anticipating what lies ahead.

Cultural Fitness

Early in my career, I underestimated the importance of cultural fitness. At QuestionPro, I realized how critical it is.

Working in harmony requires decency, mutual respect, and approachability. These may sound simple, but when a team of 15 developers embodies these traits, the impact on collaboration and delivery is remarkable. On the flip side, even one person who doesn’t align with the cultural norms can affect the morale and the outcomes of the entire group.

Cultural fitness isn’t about uniformity—it’s about respect, openness, and creating an environment where everyone feels safe to contribute.

Communication

For a long time, I thought communication was simply about fluency in English. QuestionPro changed that belief completely.

Effective communication is about clarity, context, and empathy. Some practices I’ve picked up include:

Being specific with time commitments
Example: “I’ll ping you within 10 minutes” instead of “I’ll ping you soon.”
Surfacing blockers early
Example: “I’ve been stuck here for 30 minutes; can you pair with me?”

These small practices create big changes in how smoothly a team functions. Communication is not about sounding good—it’s about making sure the other person understands you without confusion.

People Management

Finally, one of the biggest growth areas for me has been managing people. Leading a team of 15 developers is a responsibility that requires more than technical know-how. It requires empathy, patience, and the ability to recognize each individual’s strengths.

At QuestionPro, I’ve learned that good people management is less about directing and more about enabling. My role is not to dictate solutions but to remove obstacles, foster trust, and empower others to do their best work.

Sometimes this means mediating between two teammates with differing viewpoints. Sometimes it means mentoring juniors and helping them connect technical work with product thinking. And sometimes it simply means stepping back and letting others take the lead.

I’ve found that when people feel trusted and supported, their creativity and ownership naturally grow—and the product benefits tremendously as a result.

Closing Thoughts

Looking back at these three years, I can say with confidence that QuestionPro has been more than just a workplace for me—it has been a place of transformation. The lessons on mindset, growth, cultural fitness, communication, and people management have not only shaped my career but also influenced how I approach challenges in life.

As I continue this journey, I hold onto one truth: growth is not a destination; it’s a continuous cycle of learning, sharing, and evolving. And for that, I remain grateful.

How we created our Analytical System for Billions of Rows

2025-09-10T00:00:00+00:00

QuestionPro is one of the world’s leading enterprise survey platforms, which means we collect a significant amount of data every single day. As a data-centric company, we live and breathe numbers. To give you some perspective, we handle around 5 million requests daily, which translates to approximately 1.8 billion responses a year.

But raw requests don’t tell the whole story of our data footprint. When we look at what’s actually stored in our databases to understand user behavior and survey results, the scale is staggering: we’re talking about ~500 billion rows of data.

Storing this volume in a single MySQL database instance is simply not feasible—it would grind to a halt. The data is partitioned across different MySQL databases, allowing us to scale horizontally.

The Bottleneck

When we fire our analytical queries like which aggregates data using count/sum even with a fully sharded architecture, we were hitting performance walls. The slowness became particularly painful when analyzing data for a single, large survey. For surveys with over 10 million responses, it could take up to 30 seconds to get a correct answer, which was unacceptable for our users.

We tried everything in the traditional playbook: optimizing indexes, tweaking configurations, and rewriting queries. Despite our best efforts, the benefit was minimal. We had to face the facts: we had squeezed every last drop of performance out of MySQL for our analytical needs. It was time to look for a new solution—a different kind of database.

The Search Begins

Our quest for a new analytical database began. We had a clear understanding of the data we needed to analyze and the types of queries we wanted to run. We shortlisted several promising candidates:

MongoDB
Elasticsearch
Google BigQuery
Apache Druid
ClickHouse
Apache Pinot

Each database came with its own set of pros and cons. Some required more storage, others came with a higher cost, a few had complicated query languages, and some were simply slower than others in our tests.

After a thorough evaluation, one database emerged as the clear winner, solving our most critical problems: ClickHouse. The query performance was astounding. Aggregations on over 100 million records completed in milliseconds. We were thrilled.

Productionizing ClickHouse

We designed our new ClickHouse system for resilience and scale from day one. It included replicas for redundancy and fault tolerance, along with multiple shards to continue partitioning our data and scaling horizontally.

What’s Next? Making Analytics Real-Time

With the performance problem solved, a new challenge emerged: real-time data availability. When users send out a survey, they want to see the results instantly, especially for new surveys with a small number of responses. This meant our analytical database needed to have near real-time data.

We started looking for solutions to transfer data from our production MySQL databases to ClickHouse. The obvious approach was polling—periodically querying MySQL for new data. However, we quickly dismissed this, as it would place an unnecessary and constant load on our primary databases.

Instead, we found a much more elegant solution: MySQL binlogs. Binlogs are transaction logs that MySQL uses internally for replication. Essentially, any DML operation (CREATE, UPDATE, DELETE) creates an entry in these logs. We realized we could tap into this existing mechanism. Instead of replicating from MySQL to another MySQL server, we would replicate from MySQL to ClickHouse via Kafka.

To achieve this, we built a pipeline using two powerful systems:

Debezium + Kafka Connect: This stack reads data directly from the MySQL binlogs and streams it into Kafka topics in real-time.
Kafka: Acting as a highly scalable, high-throughput pub-sub message bus.

ClickHouse has the native ability to connect to Kafka and ingest data streams, and we could even perform transformations within ClickHouse itself. This gave us a complete, end-to-end ELT (Extract-Load-Transform) system. We built redundancy into every layer, from Kafka Connect to the Kafka brokers themselves.

The result? We could transfer data from MySQL to ClickHouse in under one second.

Was it this smooth? - Hell No

Our first task was to change MySQL’s binlog format from Mixed to Row. This change was critical, as the Row format was necessary to read data from the binary logs. While this might seem straightforward, we successfully executed this migration with zero downtime.

However, this solution introduced a new problem: the Row format dramatically increased storage consumption, as it records the entire row’s data for every single transaction. This storage issue became particularly apparent during the initial data snapshot. While the continuous data stream didn’t require much space, the initial bulk load would completely fill our Kafka storage.

Throughout this project, we faced and fixed numerous other issues. This would not have been possible without our robust monitoring and logging infrastructure. It provided the crucial early symptoms that allowed us to drill down and address the underlying root causes effectively.

Monitoring and Observability

A system this critical requires robust monitoring. We went beyond the basics of liveliness, readiness, CPU, and memory usage. We implemented key performance indicators specific to our pipeline, including:

End-to-end latency: The time taken for data to travel from MySQL to Kafka, and from Kafka to ClickHouse.
Throughput: The volume of data moving through the system, measured in bytes and messages per second.

Thanks to this powerful and resilient architecture, we successfully built a blazing-fast BI and analytics system called “QuestionPro BI” capable of delighting our users with instant insights from billions of records.

Building a Component Library: Lessons Learned

2025-08-28T00:00:00+00:00

At QuestionPro, we are building a component library for all our satellite projects using React. During this process, we learned a lot about setting up a project and publishing it to npm. This is not a step-by-step tutorial but rather a collection of key takeaways that might help you if you’re working on a component library.

0. Why an In-House Component Library?

You might be asking, “In a sea of UI libraries for React, why on earth would we build our own?” That’s a fair question! The answer is pretty simple: consistency and efficiency. We have a bunch of products that all share the same design system and branding. With different teams working on different projects, it gets really tough to keep all those designs perfectly aligned.

So, to ensure our products have a consistent look and feel, we decided to build a common UI library that everyone can use. This means our teams can stop worrying about moving icons “1px to the left” and instead focus on what really matters: building amazing features and functionality.

1. Keep It Simple

A React component can be as simple as an HTML element. Avoid over-engineering components by forcing props for everything. For example, don’t pass text as a label prop when you can just use children. We don’t use raw HTML like that, so try to keep the developer experience as close to HTML as possible.

// ❌ Don't
<Button label="A Button" />

// ✅ Do
<Button>A ButtonButton>

Why This Matters: Simplicity keeps the API intuitive and reduces developer friction. Your components should feel like a natural extension of HTML.

Resource: React Docs – JSX in Depth

2. Extend Types from Native Elements

When building components, you don’t need to redefine every prop. Instead, extend from React’s native element types (yes, div or button in React aren’t “raw” HTML elements). This way, you automatically inherit all supported HTML attributes, including ARIA attributes. You can then layer custom props on top.

interface ButtonProps extends React.ButtonHTMLAttributes<HTMLButtonElement> {
  variant: "primary" | "secondary" | "danger";
}

export const Button: React.FC<ButtonProps> = (props) => {
  return <button {...props}>button>;
};

Why This Matters: It keeps your types aligned with React’s ecosystem, avoids reinventing the wheel, and makes your components more flexible and accessible by default.

Resource: TypeScript Handbook – Extending Types

3. Accessibility Matters

Accessibility should not be an afterthought. Use semantic HTML and proper roles. Don’t use a div to mimic a button — use a real

From Stranger to Glue: My 5-Year Journey at QuestionPro

2025-08-18T00:00:00+00:00

Joining a new company is like moving to a completely new city. You don’t know anyone, everything feels unfamiliar, and you’re constantly questioning whether you made the right decision. That’s exactly how I felt in December 2020 when I joined QuestionPro during the height of COVID-19. New company, new domain, new people, a lot of uncertainty, doubts, questions. What would I be working on? Who would be my manager? Did I make the right decision?

The Onboarding Surprise (Dec 2020)

I joined remotely during COVID and honestly, I wasn’t sure what to expect, but QuestionPro surprised me with something I had never experienced before. One month of comprehensive product training!!

Now, you might think, “One month? That’s a lot!” but trust me, this was quite helpful!

In general, when you join most companies, you get assigned to a project directly. It’s like being handed the keys to a car without knowing the routes, traffic patterns, or even the destination. But in QuestionPro, I got the map first.

The market-research domain was completely new to me. I had worked in different domains like telecom, health, investment banking etc., but this was quite new.

I strongly believe that as an engineer, you are solving problems, so you should have good context and a deep understanding of the domain. What problem are you solving and importantly, for whom? Who is your customer? How are they using your software?

That one-month training was like building a strong foundation before constructing a skyscraper. Solid groundwork that I still rely on today.

Finding My Voice (2021-2022)

Since the start of my career in 2010, I have liked mentoring people. So when I showed interest at QuestionPro, I was given free hand to arrange any knowledge-sharing sessions, engineering townhalls. I have given a lot of knowledge-sharing sessions, helped a lot of people.

This is when I discovered something beautiful about QuestionPro’s culture. Ideas are always heard here. I’ve never seen people getting shy about sharing ideas because the environment is just that cool. You can talk to anyone, pitch any idea. It’s refreshing, honestly.

During this period, I also learned that QuestionPro doesn’t just talk about investing in people . They put their money where their mouth is. Every year, everyone gets a $250 learning budget. This wasn’t just a random decision. It came from QuestionPro’s Employee Annual Survey. The feedback, gathered through QuestionPro’s entirely anonymous Employee Annual Survey, allowed every individual to voice their opinion and this happens every year.

By the time COVID got normal, it was around 2022 when I physically met everyone for the first time. But to be honest, I never felt like I was meeting anyone for the first time.

I discovered myself a lot during this time. What am I good at? You know, once you trust your devs, they can do wonders. Someone once told me during the initial days of my career that I am like “glue” which holds a team together. QuestionPro certainly made that glue more dense!

From IC to Engineering Manager: The Reality Check ( 2022-2024)

Around 2022 I transitioned from IC to engineering manager. Honestly speaking you need to know your shit because running a team is not easy. As they say, great power comes with great responsibilities. Here’s what I learned during this transition:-

Get yourself organized first : Manage your day, if you are not organized, neither your team will
Effective communication with diagrams : It’s a great skill! I love discussing ideas, tech docs, knowledge sharing sessions, brainstorming using diagrams.
Constant feedback : Don’t wait till year end, share feedback regularly
Good listener & trust builder : Trust your team, they know their stuff
Mentor mindset : Always seek opportunity to show/learn, be the mentor you wanted for yourself
Patience in tough times - There will be days when things are not in your control. Everyday is not sunshine, have patience
Root cause finder - Fix the root cause, not the symptoms ( also my favorite for AI context prompts to avoid patch work)
Second chance believer - Everyone deserves a second chance. Don’t give up so easily, have your own mental checklist before you judge
Perspective player - Put yourself in someone else’s shoes, play the mental role of your manager/mentees
Reality check - Writing code is easier than people management

My philosophy: If your team wins, you win. Don’t work for credit. If you work for credit, then you will cut corners and play politics. Work to upgrade yourself, help others. Credit is just a side product!

Small Teams, Big Ownership

What made this transition smoother was QuestionPro’s approach to team structure. We work in small teams of 4-5. It’s the best strategy!

But here’s what’s even more beautiful, giving ownership and trusting developers means devs don’t work in silos. You know how in most companies, backend devs have no idea how their APIs are being used at frontend? Or frontend devs don’t know how queries are being used to fetch data?

Not here! Our developers work end to end. They get the holistic view, see the full picture, understand the complete context. When you know how your piece fits into the bigger puzzle, you write better code, make smarter decisions, and take real ownership.

Each engineer is a DRI (Directly Responsible Individual) . It increases ownership and focus tremendously. This small team approach helped me understand leadership without getting lost in bureaucracy.

The Evolution Story (2022-2024)

Learning from Mistakes: Our Secret Sauce

The good thing here is, we constantly evolve and learn from our mistakes. Experimentation is the biggest secret for us.

Perfect example: Those knowledge-sharing sessions.

5 years ago: We were doing traditional knowledge-sharing sessions . One person talks, others listen. Classic classroom style.

The realization: Over the time we had realized this approach was becoming a boring exercise, sometimes even a sleep marathon!

The evolution: We switched to 100% practical, hands-on “build-along” sessions. Different experts deliver learning sessions, but with a new approach i.e. practice → build → repeat.

Why? Because developers love doing hands-on work. If it’s just one-way communication, sooner or later it becomes ineffective.

Similarly, we’re currently experimenting with AI coding tools. Our entire team is using Copilot Pro to boost productivity. We don’t just adopt . We experiment and optimize.

Work-Life Balance: No Weekend Circus

During this evolution phase, I also experienced QuestionPro’s work-life balance philosophy firsthand. In my 5 years, neither I nor my team spent a single day on weekends. No weekend circus .

But it’s not just about avoiding weekends . Our hackathons are roller coaster rides every year! It’s always fun to see how much engineers can achieve. Always fun and motivating. So we learn from mistakes, grow, retrospect…repeat.

The Recipe Master (2024-Present)

Looking back at these 5 years, I can proudly say we have all the ingredients to make an awesome recipe . In QuestionPro, engineers can truly hone their craft!

I realized that I discovered myself, grew as a leader, and most importantly, helped others grow. The small pods, the DRI approach, the learning budget, the experimental mindset, the work-life balance . All of these became the environment where my “glue” qualities could flourish.

The Discovery

What started as uncertainty in a new city became a journey of discovering the best version of myself. QuestionPro didn’t just shape my career . It shaped how I think about leadership, learning, and helping others.

So, next time you’re evaluating a new opportunity, don’t just look at the salary or the tech stack. Look for the foundation, the culture, and the growth mindset. Look for a place where you can discover the best version of yourself.

Who knows? You might just find your own “glue” qualities!

Building a Reputation-Based Email Infrastructure at Scale: How We Solved the Multi-Tenant Email Delivery Crisis

2025-08-04T00:00:00+00:00

Working on email infrastructure at scale taught me that delivery problems compound quickly in multi-tenant environments. Our platform handles survey distribution, processing over 15 million emails monthly from our US datacenter alone, with six additional datacenters handling global traffic.

The shared email infrastructure that made economic sense was becoming our biggest technical liability. IP blacklisting events were frequent, delivery rates occasionally dropped below 50%, and we had minimal visibility into what was actually happening to emails after they left our servers.

This is how we rebuilt our email system from a reactive, failure-prone infrastructure into a self-managing, reputation-based architecture that now consistently delivers over 90% of emails for our top-tier clients.

The Multi-Tenant Nightmare
Phase 1: Building the Foundation - Comprehensive Tracking System
- Custom Email Tracking Infrastructure
- The Data Revolution
Phase 2: Pattern Recognition - The Culprits Revealed
Phase 3: The Reputation Revolution - Dynamic Segregation
The Results: From Crisis to Consistency
Technical Lessons and Ongoing Evolution
The Ongoing Battle
Conclusion: Building for Scale and Fairness

The Multi-Tenant Nightmare

Picture this: You’re running a SaaS survey analytics platform where thousands of organizations rely on your email infrastructure to reach their audiences. Your US datacenter alone processes 15 million emails monthly, with six other datacenters handling additional volume worldwide. Everything within a datacenter runs on a shared infrastructure model—cost-effective and seemingly efficient.

Until it isn’t.

The problems started manifesting in a cascade of failures:

1. Constant IP Blacklisting

Our email infrastructure IPs were getting blacklisted regularly by major email service providers. What should have been isolated incidents became a recurring nightmare that affected every single client using our platform.

2. Zero Visibility

We were essentially flying blind. Once an email left our servers, we had no meaningful way to track what happened to it. Did it reach the inbox? Get marked as spam? Bounce due to invalid addresses? We simply didn’t know.

3. Mystery Failures

When emails failed to deliver, we couldn’t pinpoint the exact reasons. Was it a configuration issue? Content problem? Reputation damage? The lack of granular failure tracking made debugging nearly impossible.

4. The Band-Aid Approach

Our “solution” was reactive IP switching—constantly rotating to new IPs whenever blacklisting occurred. This temporary fix actually made things worse, as constantly changing IPs prevented us from building any positive sender reputation.

The multi-tenant architecture that was supposed to be our strength had become our weakness. Bad actors were dragging down the entire system, and good clients were suffering the consequences of others’ poor email practices.

Phase 1: Building the Foundation - Comprehensive Tracking System

The first rule of fixing any system is understanding what’s actually happening. We realized that before we could solve the delivery problem, we needed to build comprehensive visibility into our email pipeline.

Custom Email Tracking Infrastructure

We implemented a sophisticated tracking system built around our existing SMTP servers:

Header-Based Tracing: Every email sent through our system now gets tagged with specific custom headers that allow us to trace any email back to the exact user and organization that triggered it. These headers became our breadcrumbs through the complex email delivery maze.

Log Parsing Engine: We built a robust log parsing system that monitors delivery status for every individual email. This system captures not just whether an email was delivered, but the specific reasons for any failures—whether it’s a soft bounce, hard bounce, spam marking, or reputation-based rejection.

User-Organization Mapping: Every email metric gets tagged back to the originating user and organization, creating a comprehensive database of sending patterns and delivery outcomes across our entire client base.

The Data Revolution

For the first time, we could answer critical questions:

How many emails is each client sending daily?
What’s the delivery rate for each organization?
Which specific clients are experiencing the highest bounce rates?
What are the most common failure reasons across different user segments?

You can’t optimize what you can’t measure, and we were finally measuring everything.

Phase 2: Pattern Recognition - The Culprits Revealed

With data flowing in, patterns began emerging that explained our infrastructure-wide problems:

The Bad Actor Problem

Our analysis revealed that a small percentage of users were responsible for disproportionate damage to our overall IP reputation:

High-Volume Spammers: Some clients were sending large volumes of emails that consistently bounced or got marked as spam
Poor List Hygiene: Organizations using outdated or purchased email lists with high invalid address rates
Content Issues: Clients sending emails that triggered spam filters due to poor content practices

Configuration Chaos

A significant portion of our delivery issues stemmed from improper email authentication:

Missing SPF Records: Clients not properly configuring Sender Policy Framework
Inadequate DKIM: Missing or incorrectly implemented DomainKeys Identified Mail
DMARC Misconfigurations: Improper Domain-based Message Authentication, Reporting, and Conformance policies

These authentication failures weren’t just affecting the individual clients—they were damaging the reputation of our shared IP pools, causing collateral damage across our entire platform.

The Collective Punishment Reality

The most sobering realization was that email service providers don’t distinguish between clients on shared infrastructure. When one client’s poor practices damaged our IP reputation, everyone suffered. Gmail, Outlook, and other major providers were treating our entire infrastructure as suspect based on the actions of our worst-performing users.

Phase 3: The Reputation Revolution - Dynamic Segregation

Armed with comprehensive data and clear understanding of the problems, we designed a solution: a dynamic, reputation-based email infrastructure that automatically segregates users based on their email practices and delivery performance.

Multi-Tier Reputation System

We created a sophisticated reputation scoring algorithm that evaluates each user and organization across multiple parameters:

Primary Factors

Authentication Compliance: Proper SPF, DKIM, and DMARC configuration
Delivery Quality: Bounce rates, spam complaints, and engagement metrics
Sending Patterns: Volume consistency, frequency patterns, and list quality indicators
Content Quality: Spam score assessments and content best practice adherence

Tier Structure

Tier	Classification	Criteria
Tier 1	Premium	Excellent email practices, proper authentication, high delivery rates
Tier 2	Standard	Good practices but room for improvement
Tier 3	Restricted	Poor email hygiene, high bounce rates, authentication issues

Dedicated IP Pool Architecture

Each reputation tier gets its own dedicated cluster of IP addresses:

Tier 1 IPs: Premium IP pools with excellent sender reputation, used exclusively by our best-performing clients
Tier 2 IPs: Standard IP pools for average performers
Tier 3 IPs: Isolated IP pools that contain users with poor email practices, preventing them from damaging higher-tier infrastructure

Self-Regulating Ecosystem

The beauty of this system lies in its self-regulating nature:

Automatic Tier Assignment: Users are automatically assigned to tiers based on their real-time performance metrics. There’s no manual intervention required—the system continuously evaluates and reassigns users based on their behavior.

Incentivized Improvement: Clients quickly realize that following email best practices directly impacts their delivery rates. Poor practices result in automatic demotion to lower tiers with worse delivery performance.

Protected Premium Experience: Our best clients enjoy consistently high delivery rates because they’re isolated from the negative impact of poor performers.

Containment Strategy: Bad actors are effectively contained in Tier 3, where their poor practices only affect others with similar behavior patterns.

The Results: From Crisis to Consistency

The transformation has been remarkable:

Delivery Rate Revolution

Metric	Before	After
Worst Case	Below 50% during blacklisting	Tier 1: Consistently >90%
Stability	Highly volatile	Consistent performance regardless of lower tiers
Recovery Time	Days to weeks	Hours (tier-isolated)

IP Reputation Recovery

✅ Reduced Blacklisting: Dramatic reduction in IP blacklisting incidents across all tiers
✅ Faster Recovery: When issues do occur, they’re contained to specific tiers, allowing faster resolution
✅ Reputation Building: Premium tier IPs continuously build positive sender reputation through consistent good practices

Client Satisfaction Impact

Predictable Performance: Clients can now predict their email delivery performance based on their practices
Clear Improvement Path: Organizations understand exactly what they need to do to improve their delivery rates
Fair Resource Allocation: Good clients no longer subsidize the poor practices of bad actors

Technical Lessons and Ongoing Evolution

This project taught us several critical lessons about building scalable infrastructure:

The Power of Data-Driven Architecture

Every decision in our new system is backed by real-time data. We’re not guessing about email performance—we’re measuring it continuously and adjusting automatically.

Self-Regulating Systems Scale Better

By creating a system that automatically responds to user behavior, we’ve built infrastructure that scales without proportional increases in manual oversight.

Reputation is a Shared Resource

In multi-tenant environments, individual behavior affects collective outcomes. Effective segregation strategies are essential for protecting good actors from bad ones.

Evolution Never Stops

This system took well over a year to fully implement across all three phases, and it continues to evolve today. Email deliverability is an ongoing battle against changing spam detection algorithms, new authentication requirements, and evolving best practices.

We’re constantly refining our reputation algorithms, adjusting tier thresholds, and improving our tracking capabilities. The system we’ve built today is significantly more sophisticated than what we launched in Phase 3, and we expect it to continue evolving.

The Ongoing Battle

Email infrastructure at scale isn’t a problem you solve once—it’s an ongoing battle that requires constant vigilance, continuous improvement, and adaptive strategies. Our reputation-based system has given us the tools to fight this battle effectively, but the landscape continues to evolve.

Major email providers regularly update their spam detection algorithms. New authentication standards emerge. Client behavior patterns shift. Our infrastructure must be flexible enough to adapt to these changes while maintaining the core principle of protecting good actors from the consequences of poor performers.

Conclusion: Building for Scale and Fairness

What started as a crisis—constant IP blacklisting and delivery rates below 50%—became an opportunity to fundamentally rethink how multi-tenant email infrastructure should work. By building comprehensive visibility, implementing data-driven reputation scoring, and creating automatic segregation based on behavior, we’ve created a system that scales fairly and performs consistently.

Key insight: In shared infrastructure environments, individual behavior has collective consequences. The solution isn’t to accept this as inevitable—it’s to build systems smart enough to automatically group users based on behavior and protect good actors from the negative externalities of poor performers.

For Technical Teams Building Similar Infrastructure

Invest in comprehensive monitoring - You can’t fix what you can’t see
Use data to drive architectural decisions - Let metrics guide your system design
Build systems that automatically adapt - Self-regulating systems scale better than manual processes

The result is infrastructure that not only scales technically but scales fairly—ensuring that clients get the level of service their practices deserve.

Our journey from email delivery crisis to reputation-based excellence proves that with the right approach, even the most challenging infrastructure problems can become opportunities for innovation that benefits everyone involved.

Mistakes I Made in My Developer Journey - Hopefully You Don’t Have To

2025-07-09T00:00:00+00:00

A software developer’s career rarely follows a straight line. Early on, you’re just trying to make the code work. Later, you’re thinking about frameworks, job switches, leadership roles, and sometimes even wondering what it’s all for. This article is my attempt to help you avoid the mistakes I made.

Over the years, I’ve noticed that most developers tend to experience similar turning points in their careers. Of course, not everyone will follow the same path, but I have observed that this pattern is surprisingly common. This article is organized around those stages, each with reflections on what I learned (often the hard way) and what I wish I had done differently.

“How Do I Get This Done?” Stage

Struggling to make sense of how the code flows, or why file and function names don’t reflect what they actually do? Welcome to the real world of software development.

You’ve learned one programming language and one popular framework. You’re assigned tasks, and it takes time to figure out what files to touch, what logic to change, and how to avoid breaking things.

At this stage, completing the task itself is the challenge. You’re driven by deadlines, and the goal is simple: just get it to work. As you gain confidence, you begin thinking about clean code and maybe a bit of refactoring.

You finally know your way around the codebase. But every task still feels like an uphill battle.

Here’s a mistake I made during this phase:

Mistake - I stayed too long in it

Building new features was fun. I didn’t mind writing tons of code, even if it meant copying the same logic to five different places. Feature toggles in ten different files? Sure, if it got the job done faster.

What I didn’t realize then is that software maintenance has a real cost. Copy-pasted code feels efficient in the moment, but over time, it becomes a minefield.

One habit I’ve developed since then: After finishing a task, I ask myself, ‘If a new developer joined tomorrow, how long would it take them to understand and enhance this feature?’

Don’t just finish tasks. Leave them in a state you (or someone else) would want to revisit.

“I Want to Learn Something New” Stage

Once you’ve gotten comfortable building features, things can start to feel… routine. You start noticing all the exciting frameworks, libraries, and tools you’re not using at work, and that sparks a familiar itch: to chase the next shiny thing.

At this point, developers typically go in one of two directions:

Look for jobs using new technology
Start side projects to explore new frameworks and tools

Both are valid, but in my experience, building side projects is far more valuable over time. It gave me hands-on experience and helped me understand whether I actually liked working with those technologies. Reading tutorials is fine, but writing real code in a project of your own makes everything click.

What I did wrong during this phase:

Mistake - I focused too much on chasing frameworks, and too little on computer science fundamentals

If I could go back, I’d invest earlier in:

Data structures and algorithms (not for interviews - practically)
Database design and indexing
Operating systems and networking basics

These concepts give you the mental models needed to debug production issues, reason about performance, and understand trade-offs in libraries and tools.

Like, once I truly understood how B-trees work, database indexing stopped feeling like magic.

“I Want to Be a Team Lead” Stage

After gaining experience, developers tend to diverge: Some stay on the individual contributor (IC) path, while others feel drawn toward leadership roles, often because it’s the default next step. But leadership isn’t for everyone, and it shouldn’t be taken for granted. In fact, it’s often easier to evaluate your fit for the IC path based on technical growth and performance.

If you’re considering a leadership role, ask yourself:

Do you enjoy mentoring others?
Do you have patience, empathy, and a genuine desire to help teammates grow?

These are the traits that make someone a good leader, not just experience or tenure.

In my view, effective team leads and engineering managers need more than strong people skills. There’s a heavy emphasis on technical leadership and management. You’re not just running one-on-ones or giving feedback - you’re identifying and clearing roadblocks, shaping the technical roadmap, managing tech debt, and ensuring the team is making smart technical decisions. Leadership in tech is about guiding both people and systems.

Here are the mistakes that I made:

Mistake 1 - Don’t choose the IC path just to avoid working with people

As an individual contributor, you may not be doing performance reviews or managing career plans, but that doesn’t mean you’re working in isolation. You’ll still be expected to mentor teammates, lead technical discussions, and engage in collaborative decision-making around infrastructure and architecture. As you grow, your responsibilities naturally expand to cross-team initiatives, where strong communication and collaboration become just as critical as your technical skills.

Mistake 2 - You need to actively develop your people skills

Even if you’re a team lead, giving meaningful feedback, running effective one-on-ones, mentoring juniors, and helping others grow doesn’t come automatically. Just like technical skills, these require deliberate practice, reflection, and continuous learning. Whether it’s practicing active listening, becoming more thoughtful in how you communicate feedback, improving your ability to navigate conflict, or coaching and mentoring others, these are all people skills you can intentionally grow with time and effort.

Whether you choose the IC or leadership path, people skills are essential. Collaboration, mentoring, and communication are just as important as writing good code.

The “What Am I Doing With My Life?” Stage

Turns out the mid-life crisis is real :). Maybe this is the point where you stop looking for answers in code and start searching for meaning beyond it. I’m still figuring this one out - and I imagine many others are too.

Engineering Blog

QuestionPro TechCommit Pune: Scaling Architectures, AI, and Networking

🗓️ Essential Event Details

🎙️ Agenda: Deep Dives by QuestionPro Experts

Talk 1: From Zero to 10k Live Users: Scaling Real-time Engagement Platforms

Talk 2: AI in Action: Building a Smarter Topic Analysis System

Why This Pune Tech Meetup Is a Must-Attend

Secure Your Spot

Scaling LivePolls to 10K Concurrent Users - A Journey Beyond Limits

A Bit of Context

The Challenge

Problems We Faced

Solutions We Devised

1. Building Infrastructure for 10,000 Virtual Participants

2. Fixing the Memory Leak

3. Tuning Nginx for 10K+ Concurrent Connections

4. Optimizing Slow MySQL Queries

Naive / Inefficient Approach (what we had)

Optimized Queries - Let MySQL Do the Heavy Lifting

1. Get response count for each answer option (for charting)

2. Get respondents sorted by score and limit (leaderboard)

The Result

Final Thoughts

The Conscious Vibe Coder

The Vibe Shift in Coding

Boilerplate Isn’t the Enemy, But Mindless Acceptance Is

Clarity of Thought Drives Code Quality

Review Like a Mentor, Not a Machine

The Human Advantage in the Loop

Coding with Product Empathy

Code That Runs vs. Code That Scales

The Conscious Loop: AI + Human Refinement

Final Thoughts

Manual Topic Mapping to AI-Powered Discovery: Our Journey Building a Smarter Topic Analysis System

The Beginning: Discover and How We Used to Work

How Discover Worked

When the System Started Breaking Down

The Expert Bottleneck

The Problem of Emerging Topics

The Granularity Dilemma

The Multilingual Nightmare

The Turning Point: What If Topics Could Discover Themselves?

The New Vision

Building the New System: From Classification to Discovery

The Fundamental Shift

How Automatic Topic Discovery Works

The Magic of BERTopic Clustering

Smart Configuration: One Size Doesn’t Fit All

What About Discover’s Classification Strength?

The Migration Journey: Challenges We Faced

Challenge 1: Speed vs. Intelligence

Challenge 2: Making Sense of Discovered Topics—Enter OpenAI

Challenge 3: Optimizing BERTopic for Different Data Volumes

Challenge 4: When Topics Evolve

The Advantages: What Changed for the Better

Dramatically Better Topic Quality

Never Miss Emerging Topics Again

Higher Quality Topic Granularity

Semantic Search That Actually Understands

True Multilingual Understanding

What We Learned Along the Way

The Importance of Human Touch

Start Simple, Add Complexity Gradually

The Value of Explaining the AI

Looking Ahead: What’s Next

Smarter Topic Evolution Tracking

Predictive Intelligence: From What to Why

Real-Time Topic Monitoring

The Bottom Line: Different Tools for Different Jobs

The Real Achievement

Technical Summary

Key Quality Improvements

How QuestionPro Has Impacted My Tech Career

Mindset Shift

Scope of Growth

Learning Environment

Collective Learning

Continuous Push and Improvement

Understanding the Product’s Goal

Cultural Fitness