-
Notifications
You must be signed in to change notification settings - Fork 0
9: Node.js Core Streams
With Guest Tim Oxley (@timoxley)
- Panelists @ctefanache,@nevraeka
- Recorded 10:00 AM EST 09/24/2015
- YouTube https://www.youtube.com/watch?v=TGVleRA_VDM
Tim Oxley (@timoxley) joins us to take us through Node.js Core Streams. Streams have gone through a lot in the Node core and they are one of the more debated parts of the codebase. Tim demonstrates what is happening when we use streams in Node, how to debug node core, and much more.
Tim resides in Singapore and is employed at NodeSource. He works with Node everyday and is very knowledgeable in Node Core and the JavaScript community. Tim spoke a bit on Streams on his podcast called NodeUp which is what led us to recruit him.
const fs = require('fs')
const filename = process.argv[2]
fs.readFile(filename, 'utf8', (err, data) => {
if (err) throw err
data = data.toUpperCase()
fs.writeFile(`transformed_${filename}`, data)
})> cat data-0.txt
> node readfile-buffering.js data-0.txt
> cat transformed_data-0.txt
> node readfile-buffering.js data-300.txt
buffer.js:369
throw new Error('toString failed');
^
Error: toString failed
at Buffer.toString (buffer.js:369:11)
# Ran out of memory (and exposed node bug).
https://github.com/nodejs/node/issues/2767
> node readfile-buffering.js data-100.txt
const fs = require('fs')
const stream = require('stream')
const upperCaser = new stream.Transform({
transform(chunk, encoding, done) {
this.push(String(chunk).toUpperCase())
done()
}
})
const filename = process.argv[2]
fs.createReadStream(filename)
.pipe(upperCaser)
.pipe(fs.createWriteStream(`transformed_${filename}`))> node readfile-buffering.js data-0.txt
> cat transformed_data-0.txt
# Success
> node readfile-streaming.js data-100.txt
# Success!
> node readfile-buffering.js data-300.txt
# Success. Buffering couldn't handle this file
Streams vs Buffers

- Streaming reading 800MB
- Streaming reading 300MB
- Streaming reading 100MB
- Buffered reading 100MB
const fs = require('fs')
const stream = require('stream')
const upperCaser = new stream.Transform({
transform(chunk, encoding, done) {
this.push(String(chunk).toUpperCase())
done()
}
})
const filename = process.argv[2]
fs.createReadStream(filename, {
highWaterMark: 1024 * 16 // adjust this to balance memory usage vs throughput
})
.pipe(upperCaser)
.pipe(fs.createWriteStream(`transformed_${filename}`))
- Streaming reading 100MB with high water mark set to 1 byte
- Streaming reading 100MB with high water mark set to 32kb
- Streaming reading 100MB with high water mark set to 32kb & a console.log on each chunk
- Streaming reading 800MB
- Streaming reading 300MB
- Streaming reading 100MB
- Buffered reading 100MB
Packages often implement a stream-like interface, but it's not actually a stream. Classic example is having an event emitter that emits 'data' events but it's not actually a stream.
const EventEmitter = require('events')
class MyAPI extends EventEmitter {
read (data) {
this.parse(data).forEach(chunk => {
this.emit('data', chunk)
})
}
...
}- Inconsistent API.
- No composability via pipe.
- No backpressure.
- No buffering.
const Readable = require('readable-stream').Readable
class MyAPI extends Readable {
_read (data) {
this.parse(data).forEach(chunk => {
this.push('data', chunk)
})
}
...
}https://github.com/nodejs/node/tree/master/lib
https://github.com/nodejs/readable-stream Originally written by isaac for streams2.
Protect your codebase from future changes.
Simplest implementation of a 'Stream'.
stream.emit('data', chunk)https://github.com/nodejs/node/blob/master/lib/stream.js#L26-L37
https://github.com/nodejs/node/blob/master/lib/stream.js#L39-L45
https://github.com/nodejs/node/blob/master/lib/stream.js#L47-L60
This is the simplest implementation of a stream but is limited in usefulness as it:
- Is a push stream
- Does not support buffering
- Ideally this code isn't actually run.
- Mainly for legacy reasons e.g.
maybeStream instanceof Stream - https://github.com/nodejs/node/issues/2961
YOLO by default
https://github.com/nodejs/node/blob/master/lib/stream.js#L70-L76
a
.pipe(b)
.pipe(c)
.pipe(d)
.pipe(e)a
.pipe(b).on('error', onError)
.pipe(c).on('error', onError)
.pipe(d).on('error', onError)
.pipe(e).on('error', onError)This is kinda crap IMO because it means that most people simply don't attach error handlers when piping.
Chaining streams together with is one of the nicest parts of streams but it's kinda ruined when you need to attach error handlers all over the place.
- Readable
- Writable
- Duplex
- Transform
- PassThrough
- Buffering is bad when the buffers are too big.
- Memory vs Throughput
- Allocates a slab of untyped memory.
- Basically a
malloc - Backed by TypedArrays
- TypedArrays before we had TypedArrays
- Strings & Buffers Only
> process.stdout.write({object: 'data'})
TypeError: invalid data
at WriteStream.Socket.write (net.js:617:11)
...
> process.stdout.write(JSON.stringify({object: 'data'}))
{"object":"data"}
- Any JS Object, except
null - e.g. https://github.com/dominictarr/JSONStream
var stream = JSONStream.parse(['rows', true, 'doc'])
stream
.pipe(new stream.Transform({
objectMode: true,
transform (chunk, enc, done) {
console.log(typeof chunk) // 'object'
this.push(JSON.stringify(chunk))
done()
}
}))
.pipe(process.stdout)
Buffered "Source".
https://github.com/nodejs/readable-stream/blob/master/lib/_stream_readable.js
Buffered "Sink".
https://github.com/nodejs/readable-stream/blob/master/lib/_stream_writable.js
Stream that's both Readable and a Writable.
https://github.com/nodejs/readable-stream/blob/master/lib/_stream_duplex.js
TCP socket as created by net is a duplex stream.
import net from 'net'
import {Transform} from 'stream'
net.createServer(socket => {
socket.pipe(new Transform({
transform(chunk, enc, done) {
console.log('server got %s', chunk)
this.push(String(chunk).toUpperCase())
this.push(null)
done()
}
})).pipe(socket)
}).listen(3000)
const client = net.connect(3000)
client.write('hello')
client.pipe(process.stdout)
client.on('end', () => process.exit(0))Duplex Stream with a "map" function that maps input to output.
https://github.com/nodejs/readable-stream/blob/master/lib/_stream_transform.js
Read-Only Transform Stream.
https://github.com/nodejs/readable-stream/blob/master/lib/_stream_passthrough.js
https://github.com/mafintosh/duplexify
One of the things that surprises people when they first start working with HTTP in node is the body of requests isn't there by default for http requests and responses.
HTTP Request and Response are Readable & Writable respectively.
Common to see stuff like:
Slow!
http.createServer((req, res) => {
var body = ''
request.on('data', data => {
body += data
})
request.on('end', () => {
JSON.parse(body) // or whatever
})
})http.createServer((req, res) => {
let body = new Buffer(0)
request.on('data', data => {
body = body.concat(data)
})
request.on('end', () => {
JSON.parse(String(body)) // or whatever
})
})http.createServer((req, res) => {
let pieces = []
req.on('data', data => {
pieces.push(data)
})
req.on('end', () => {
let body = new Buffer(0)
for (var i = 0; i < pieces.length; i++) body = body.concat(pieces[i])
JSON.parse(String(body)) // or whatever
})
})import bl from 'bl'
http.createServer((req, res) => {
req.pipe(bl((err, body) => {
// TODO handle err
JSON.parse(String(body)) // or whatever
}))
})Don't be afraid to use 3rd party packages. This stuff isn't really designed to be used raw. That's the whole point of having a minimal core, rather than baking too many opinions into node core, users are asked to lean heavily on userland.
- Substack
- Dominic Tarr
- Rod Vagg
- Julian Gruber
- Max Ogden
- Mafintosh
- Feross
- LevelDB
- Domenic Denicola & WhatWG https://github.com/whatwg/streams
- https://github.com/kriskowal/gtor
- https://github.com/substack/stream-handbook
- https://www.github.com/substack/stream-adventure
- Flow-based Programming
- Reactive Extensions
- BaconJS
- Kefir