The document discusses various concepts related to programming and physics, including:
- There are physical limits to what hardware can do based on laws of physics.
- Arrays can be inefficient for storing large amounts of data and other methods may be better.
- Streams provide a standard way to access input and output in a linear, chunk-based fashion and are widely used across programming languages and systems.
2. THE LAWS OF PHYSICS ALWAYS APPLY
• CPUS use electricity and produce heat
• Even computers “in the cloud” are on physical hardware
• There is a limit to the amount of throughput a nic can push
• The larger the data the longer it takes to move it, and the more surface it takes to store it
3. ARRAYS ARE EVIL
• There are other ways to store data that are more efficient
• They should be used for small numbers of data
• No matter how hard you try, there is C overhead
4. USE THE ITERATION, LUKE
• Lazy fetching exists for database fetching – use it!
• Always page (window) your result sets from the database – ALWAYS
• Use filters or generators to format or alter results on the fly
5. STREAM YOUR DATA
• Work on chunks at a time
• Seek back and forth through data if necessary
• Use PHP streams as they were meant to be used
6. STREAMS: COMPUTING CONCEPT
Definitions
• Idea originating in 1950’s
• Standard way to get Input and Output
• A source or sink of data
Who uses them
• C – stdin, stderr, stdout
• C++ iostream
• Perl IO
• Python io
• Java
• C#
7. WHAT IS A STREAM?
• Access input and output generically
• Can write and read linearly
• May or may not be seekable
• Comes in chunks of data
8. WHAT USES STREAMS?
• EVERYTHING
• include/require _once
• stream functions
• file system functions
• many other extensions
11. WHAT ARE FILTERS?
• Performs operations on stream data
• Can be prepended or appended (even on the fly)
• Can be attached to read or write
• When a filter is added for read and write, two instances of the filter are created.
13. THINGS TO WATCH FOR!
• Data has an input and output state
• When reading in chunks, you may need to cache in between reads to make filters useful
• Use the right tool for the job
14. PROCESS WITH THE APPROPRIATE TOOLS
• Load data into the appropriate place for processing
• Hint – arrays are IN MEMORY – that is generally not an appropriate place for processing
• Datastores are meant for storing and retrieving data, use them
15. OFFLOAD WORK
• Put work items in queues and inform the user when they’re completed
• It’s not realistic to expect complex reports to be done in seconds, physics apply here too
• Caching complex work items is a good way to balance offloaded work with immediate results
16. COMMUNICATE WITH OTHER PROCESSES
• Microservices are in essence jobbed systems communicated via http
• You can overload them to work via unix sockets as well
• Rachet or other websockets solutions allow for heavy work with multiplexed communication
• PHP can run in daemons, and even listen and communicate over sockets
18. DEFINITIONS
• Socket
• Bidirectional network stream that speaks a protocol
• Transport
• Tells a network stream how to communicate
• Wrapper
• Tells a stream how to handle specific protocols and encodings
20. THAT SOCKETS EXTENSION…
• New APIS in streams and filesystem functions are replacements
• Extension is very low level
• stream_socket_server
• stream_socket_client
No matter how many virtual machines you throw at a problem you always have the physical limitations of hardware. Memory, CPU, and even your NIC's throughput have finite limits. Are you trying to load that 5 GB csv into memory to process it? No really, you shouldn't! PHP has many built in features to deal with data in more efficient ways that pumping everything into an array or object. Using PHP stream and stream filtering mechanisms you can work with chunked data in an efficient matter, with sockets and processes you can farm out work efficiently and still keep track of what your application is doing. These features can help with memory, CPU, and other physical system limitations to help you scale without the giant AWS bill.
n PHP 5.x a whopping 144 bytes per element were required. In PHP 7 the value is down to 36 bytes, or 32 bytes for the packed case but it’s STILL not the best
Quick computer science lesson
Originally done with magic numbers in fortran, C and unix standardized the way it worked
On Unix and related systems based on the C programming language, a stream is a source or sink of data, usually individual bytes or characters. Streams are an abstraction used when reading or writing files, or communicating over network sockets. The standard streams are three streams made available to all programs.
Who else uses them? Most languages descended from C have the “files as streams concept” and ways to extend the IO functionality beyond merely files, this allows them to be merged all together
Great way to standardize the way data is grabbed and used
Questions on who has used streams in other languages
Streams are a huge underlying component of PHP
Streams were introduced with PHP 4.3.0 – they are old, but underuse means they can have rough edges… so TEST TEST TEST
But they are more powerful then almost anything else you can use
Why is this better ?
Lots and lots of data in small chunks lets you do large volumes without maxing out memory and cpu
Any good extension will use the underlying streams API to let you use any kind of stream
for example, cairo does this
stuff to work with PHP streams is spread across at least two portions of the manual, plus appendixes for the build in transports/filters/context options. It’s very poorly arranged so be sure to take the time to learn where to look in the manual – there should be three main places
What doesn’t use streams? Chmod, touch and some other very file specific funtionality, lazy/bad extensions, extensions with issues in the libraries they wrap around
All input and output comes into PHP
It gets pushed through a streams filter
Then through the streams wrapper
During this point the stream context is available for the filter and wrapper to use
Streams themselves are the “objects” coming in
Wrappers are the “classes” defining how to deal with the stream
Some notes – file_get_contents and it’s cousin stream_get_contents are your fastest most efficient way if you need the whole file
File(blah) is going to be the best way to get the whole file split by lines
Both are going to stick the whole file into memory at some point.
For very large files and to help with memory consumption, the use of fgets and fread will help
A filter is a final piece of code which may perform operations on data as it is being read from or written to a stream. Any number of filters may be stacked onto a stream. Custom filters can be defined in a PHP script using stream_filter_register() or in an extension using the API Reference in Working with streams. To access the list of currently registered filters, use stream_get_filters().
Stream data is read from resources (both local and remote) in chunks, with any unconsumed data kept in internal buffers. When a new filter is prepended to a stream, data in the internal buffers, which has already been processed through other filters will not be reprocessed through the new filter at that time. This differs from the behavior of stream_filter_append().
Filters are nice for manipulating data on the fly – but remember you’ll be getting data in chunks, so your filter needs to be smart enough to handle that
Filters can be appended or prepended – and attached to READ or WRITE
Notice that stream_filter_prepend and append are smart – if you opened with the r flag, by default it’ll attach to read, if you opened with the w flag, it will attach to write
Note: Stream data is read from resources (both local and remote) in chunks, with any unconsumed data kept in internal buffers. When a new filter is prepended to a stream, data in the internal buffers, which has already been processed through other filters will not be reprocessed through the new filter at that time. This differs from the behavior of stream_filter_append().
Note: When a filter is added for read and write, two instances of the filter are created. stream_filter_prepend() must be called twice with STREAM_FILTER_READ and STREAM_FILTER_WRITE to get both filter resources.
Well it may look like manipulating data in a variable is preferable to the above. But the above is just a simple example. Once you add a filter to a stream it basically hides all the implementation details from the user. You will be unaware of the data being manipulated in a stream.
And also the same filter can be used with any stream (files, urls, various protocols etc.) without any changes to the underlying code.
Also multiple filters can be chained together, so that the output of one can be the input of another.
The filters need an input state and an output state. And they need torespect the the fact that number of requested bytes does not necessarilymean reading the same amount of data on the other end. In fact the outputside does generally not know whether less, the same amount or more input isto be read. But this can be dealt with inside the filter. However thefilters should return the number input vs the number of output filtersalways independently. Regarding states we would be interested if reachingEOD on the input state meant reaching EOD on the output side prior to therequested amount, at the requested amount or not at all yet (more dataavailable).
What is streamable behavorior? We’ll get to that in a bit
Protocol: set of rules which is used by computers to communicate with each other across a network
Resource: A resource is a special variable, holding a reference to an external resource
Talk about resources in PHP and talk about general protocols, get a list from the audience of protocols they can name (yes http is a protocol)
A socket is a special type of stream – pound this into their heads
A socket is an endpoint of communication to which a name can be bound. A socket has a type and one associated process. Sockets were designed to implement the client-server model for interprocess communication where:
In php , a wrapper ties the stream to the transport – so your http wrapper ties your PHP data to the http transport and tells it how to behave when reading and writing data
By default sockets are going to assume tcp – since that’s a pretty standard way of doing things. Notice that we have to do things the old fashioned way just for this simple http request – sticking our headers together, making sure stuff gets closed. However if you can’t use allow_url_fopen this is a way around it
a dirty dirty way but – there you have it
remember allow_url_fopen only stops “drive-by” hacking
Avoid the old sockets extension unless you really really know what you’re doing
Most of the things you used to need the sockets extension for you no longer do
those last two functions, stream socket server and stream socket client make doing a client/server relationship really easy with much less code
It’s sometimes hard to find examples on the stream_socket stuff since most of the old stuff on the internet still uses the sockets extension
Don’t follow their lead, take the time to read the php documentation and use the new APIs
There is SOOO much more you can do from hooking objects to hooking the engine!