Node.js Stream Dreams

Stream
Allie Jones
Allie Jones

One of the best parts of Node.js is non-blocking asynchronous i/o event handling; but what does that really mean?

I/o or input/output is the communication from one system to another. We want applications to perform tasks simultaneously, yet remain responsive to user interaction. We can utilize Node.js to make that happen. Node.js closes the event loop to execute requests without waiting or getting "blocked" for a returned response. Node.js does this by utilizing its V8 engine and its LIBUV abstraction layer to build a high number of requests that operate on a single thread.

A thread is a block of code that runs asynchronously to another block of code, either by running in parallel on a separate processor or by using a scheduler. What makes it a thread is that it doesn't need to wait for other threads to complete. A program can call a request, and move on to the next request before the first request returns a response. This is similar to being at a restaurant; you can order food while other guests order their food at the same time. The second guest does not have to wait for guest one to get food (response) before making an order (request). This dining experience would be asynchronous in nature.

Synchronous event handling occurs in a sequential manner. Traditional synchronous web services must open a thread for each event. Then the service must wait for all processes to finish before continuing. Larger applications can have multiple threads, but each one is still bound to a sequential order of execution. This is similar to being at a deli and waiting in line to order. You are "blocked" from ordering (request) until the person in front of you recieves their food (response). This sequential order of events is synchronous in nature.

The ability to execute requests asynchronously, without waiting for the first request response, is what makes Node.js so fast and scalable and makes it ideal for handling inbound and outbound tasks (i/o).

For example, let's say we want to take a large file (input) and serve it to users (output). We have a large text file about Tyrannosaurus Rex that we want to read. We could use the fs.readfile() to read through this file like so:

var fs = require('fs');
fs.readFile("./trex.txt", "UTF-8", function (err, trex) {
  console.log(`File Read ${trex.length}`);
});
console.log(" Reading File.");

When we run this file with node trex-readFile.js it returns:

Reading File.
File Read 139440

We can see from the function above that it printed "Reading File" before the readFile() function finished executing. The console log was able to execute asynchronously as the fs.readFile() function was being executed.readFile() function finished executing.

The function readFile() uses a callback to return and log the length of my file. The problem with readFile() is that it won't invoke the callback until the entire file is read and buffered into memory.

This can cause problems when we are dealing with large amounts of data and/or high traffic. It would be better in most circumstances to avoid buffering data altogether and only generate the data when the consumer asks for it. A better way to handle this data transfer would be to use Node.js Streams.

Streams

The Node.js class stream lets you read data from a source or write data to a destination in a continuous fashion. This comes in handy when you are reading large data files. You can query, gzip and pipe large files into a new destination without having to buffer the file into memory.

The real beauty of streams is that we don’t have to buffer the entire request into memory all at once, a process that takes up time, space and bandwidth.

Some common uses of streams in Node.js are

  • http/TCP requests & responses
  • Reading and writing files
  • Compressing/decompressing files

For example, the HTTP server receives a request (a readable stream) and sends a response (writable stream). The fs module also lets us work with both readable and writable file streams.

In Node.js there are 3 types of streams:

  • Readable
  • Writable
  • Duplex/Transform (both readable and writable)

For the scope of this blog article, we'll discuss Readable and Writable streams.

Readable Streams

Readable streams let us read data from a source without having to buffer the whole file into memory. Readable streams emit data events each time they get a “chunk” of data and then emit end when they are finished.

The original implementation of streams was all about emitting events (push data in), whereas streams2 lets code read chunks at their own rate (pull data in). There is also a streams3 approach which lets you mix and match both approaches.

We want to use a Readable stream when supplying data as a stream. Let's go back to the Trex example from above. When we logged the output of our file:

console.log(`File Read ${trex.length}`);

We got one response back once the process was finished:

 File Read 139440

This means the entire file needed to be processed before moving forward. Now let's replace the readFile with a readstream.readFile with a readstream.

First we need to require the fs package, then we can create a readable stream and give it a text format. Then we will create a string variable onto which we we will concatenate our chunks as they arrive. As mentioned above, readStreams emit data events each time they get a chunk of data. To demonstrate that, we will first create a readStream. Then we will use the .on event emitter to listen for "data" events. We will also use process.stdout.write method to write each chunk to the terminal with the length of its chunk. Also, we will create a data variable so we can concatenate each chunk to that variable.

For clarity, we will use the once callback to log to the console that we "Started Reading The File" to declare when the stream starts. We will also add another listener to listen for "end" event.

ReadStream example:


var fs = require('fs');
var stream = fs.createReadStream('./trex.txt', 'UTF-8');
var data = '';

stream.once('data', function () {
  console.log('\n');
  console.log('Started Reading The Trex File');
});

stream.on('data', function (chunk) {
  process.stdout.write(`chunk: ${chunk.length} \n`);
  data += chunk;
});

stream.on('end', function () {
  console.log(`Finished Reading The Trex File ${data.length}`);
  console.log('\n');
});

When we run this file we get:


Started Reading The Trex File
chunk: 65291
chunk: 64565
chunk: 9584
Finished Reading The Trex File 139440

We can see the createReadStream sent the data over in three chunks. Although this file is relatively small, it would make a big difference for larger files that or repeated requests. By using streams we don't have to wait for the entire data to buffer before acting on the response. This continuous flow of data is a more efficient request and response(i/o) event handling.

Writable Streams

We can use writable streams when we want to collect or manipulate data. Streams also ensure that data chunks arrive in the correct order. For simplicity, the strings are written out below, but we could use dynamic data to stream to a new file. Some other useful cases for this would be for creating log files on events or taking user input data. For writable streams we need to call two functions: write and end.

Writable Stream example:

var fs = require('fs');

var stream = fs.createWriteStream('tinyarms.txt');
stream.write('Tyrannosaurus means "tyrant lizard". \n');
stream.write('T-Rex hates pushups.');
stream.end();

This creates a file called tinyarms.txt and adds our two new strings: Tyrannosaurus means "tyrant lizard" and T-Rex hates pushups.

Pipes

We can also take advantage of the pipe method to chain events. By using pipes we can take a readable stream and chain the event it into a writable stream. We can also alter the stream in the process. For example, let's take the trex.txt file compress it and then pipe it into a writable stream.

var fs = require('fs');
var zlib = require('zlib');

var readstream = fs.createReadStream('trex.txt');
var compress = zlib.createGzip();
var writestream = fs.createWriteStream('trex.txt.gz');
readstream.pipe(compress).pipe(writestream);

Here .pipe() takes care of listening for 'data' and the 'end' events. Using pipes create cleaner more readable code while allowing us to chain events. Using .pipe() has other benefits too, like handling backpressure automatically. Backpressure can occur with slow connections, big files or relating to buffering.

Piping it all together

Streams are like pipelines that help control and manage data flow. We don't need to read in all the data at once; instead, we can read data in small chunks continuously. Therefore, we can take advantage of the asynchronous nature of Node.js for i/o event handling. Although Node.js takes advantange of streams; streams are not a new concept to programming; they stem from the Unix pipes concept and philosophy.

"This is the Unix philosophy: write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface," said Doug McIlroy, the Inventor of Unix pipes and one of the founders of the Unix tradition.

Taking advantage of streams will greatly improve scalability and decrease memory usage.

Want to learn more about Node.js?

Zivtech hosts the Philly Node.js meetup every second Tuesday of the month. We provide pizza and beer for an informal meetup to discuss Node.js. The Philly Node.js meetup is a space where people can learn, collaborate and share Node.js code.

Further Reading and Resources:

Ready to get started?

Tell us about your project