Unveiling Real-Time Responses with ChatGPT: A Dive into Server-Sent Events

Method	Name	Status	Type	Last Message
GET	/sse	200	text/event-stream	far,

Once upon a time, in a land|

_________

Hello everyone! 🚀

Imagine this scenario: you're interacting with ChatGpt, and it feels like it's responding to you in real-time, just like a person on the other end of the chat. 💻
But how does this magical 🪄 system respond so promptly? Does it use WebSockets¹ or polling²? In reality, the solution it employs is much simpler: Server-Sent Events (SSE)!

💥 "SSE? What's that?"
That's exactly what I thought when I first encountered it, but I was amazed by its effectiveness. In my quest for experimentation, I immediately created a prototype using Astro + a netlify edge function.
PS: the prototype is the one you see at the top of this article, of course.

If you open the Inspector and go to the Network tab ( 📣 remember to refresh the page), you'll notice that my site is making a request to "/sse". What's interesting is that in the response the Content-Type is set to "text/event-stream" and on the right you'll see a stream of events with Type, Data, and Time.
The magic happens thanks to an API route that sends a word every few milliseconds (I've randomized this function, mimicking what a real server would do during a request, much like ChatGpt).
At the end of the text, I send an "end-of-stream" event to instruct my client to close the EventSource.

Here's the server mock code ( it's a netlify edge function)

function sleep(ms) {
   return new Promise((resolve) => setTimeout(resolve, ms));
}

async function sendStory() {
   const story =
      `Once upon a time, in a land far, far away federico bartoli was born and he want to be a great developer with many dinos in his blog.I have to write many words 'cause i want to show you the power of server sent events.`.split(
         " "
      );
   const textEncoder = new TextEncoder();

   let wordIndex = 0;

   return new Response(
      new ReadableStream({
         async start(controller) {
            const sendWord = async () => {
               while (wordIndex < story.length) {
                  const wordBuffer = textEncoder.encode(
                     `data: ${story[wordIndex]}\n\n`
                  );
                  controller.enqueue(wordBuffer);
                  wordIndex++;
                  let randomSpeed = Math.floor(
                     Math.random() * (800 - 200 + 1) + 200
                  );
                  await sleep(randomSpeed);
               }
               const endBuffer = textEncoder.encode(
                  `event: end-of-stream\ndata: Stream ended\n\n`
               );
               controller.enqueue(endBuffer);
               controller.close();
            };

            await sendWord(); // Start sending words
         },
      }),
      {
         headers: {
            "Content-Type": "text/event-stream",
            "Cache-Control": "no-cache",
            Connection: "keep-alive",
"Access-Control-Allow-Origin": "https://federicobartoli.it/",
         },
      }
   );
}

export default () => sendStory();

export const config = { path: "/sse" };

The client-side is equally simple and incredibly awesome! It involves creating an instance of EventSource, which opens a persistent connection to an HTTP server. This connection remains open until EventSource.close() is called. In my case, I close the EventSource upon receiving the "end-of-stream" event.

Here's a part of the client-side code ( Only the useEffect ) :

 useEffect(() => {
      const eventSource = new EventSource("/api/sse");
      eventSource.addEventListener("end-of-stream", () => {
         eventSource.close();
      });

      eventSource.onmessage = (event) => {
         const newWord = event.data;
         setCurrentMessage((prevMessage) =>
            prevMessage ? prevMessage + " " + newWord : newWord
         );
      };

      return () => {
         eventSource.close();
      };
   }, []);

Simple and truly amazing, don't you think? Of course, I'll delve deeper into this topic, and if I find ways to improve this content, I'll update my blog post. Feel free to reach out to me on LinkedIn for further discussion or to provide feedback. Thanks, everyone!

P.S. ChatGPT employs Server-Sent Events (SSE) for delivering real-time responses to the client. SSE is a simple and efficient standard for sending real-time updates from the server to the client over HTTP, without requiring the client to send requests repeatedly. This unidirectional flow of data is suitable for scenarios where the server needs to push updates or notifications to the client, but the client doesn't need to communicate back to the server in real-time. By utilizing SSE, ChatGPT can efficiently deliver real-time responses with lower complexity and overhead compared to a bi-directional communication protocol like WebSockets. Additionally, SSE is built upon the HTTP protocol, making it a natural fit for web applications and easier to implement with existing web technologies.

Useful resources:
https://developer.mozilla.org/en-US/docs/Web/API/EventSource
https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events

WebSockets establish a full-duplex communication channel that operates through a single, long-lived connection between a client and server. This full-duplex capability allows both the client and server to send data to each other simultaneously, facilitating real-time interactions. Unlike traditional HTTP, where each interaction necessitates a new request-response cycle, WebSockets keep the connection open, enabling continuous data flow in both directions. This bi-directional communication is particularly useful in applications that require real-time updates and interactions, such as live chat applications, online gaming, and financial trading platforms.
Polling is a technique where the client periodically queries the server to check for new data. Though simple to implement, polling can be resource-intensive if the intervals are short, and may introduce delays in data updates if the intervals are long. It's a suitable choice for applications where real-time updates are not critical.