WARNING
This is a draft
I don’t have worked with Erlang for at least 6 months or more! My last project with Erlang, probably one of the most ambitious have made thing a long time, was in stand-by mode. So, what’s the deal with this project?
Erlang gives you so many possibility to make your code simple and easier to understand, that, some of my impossible dreams are going real. First thing first, this project came in December, talking with a colleague about a ZFS stream proxy… Yeah, sometimes, projects came from crazy idea. After this discussion, I realized I had practically the same idea years before, for another of my project based on FreeBSD, a distributed jail services.
At this time, I wasn’t aware of all the complexity, but, I started by implementing a high view of ZFS Stream data-structure to read easily a snapshot dump. My PoC is working, but need a lot of features. I can now read any kind of data coming from snapshots. I will probably write an article on this part, it was so interesting!
The next part is the forwarder. I like DRY (Don’t Repeat Yourself) philosophy, and lot of code are designed to be reused in other code without change. I was thinking to create an Erlang application dedicated only for this task. So, what do we need to accomplish this task?
First, we need a collector, this piece will get all data from outside world. This collector can be active (create a new connection and maintain the state) or passive (act as a server, waiting for data from outside world).
We need a sender, if you have some data coming on your system, and want to forward same in other(s) place(s), you will need a code to send entering data to some end-point. This sender is active and will act as a client sending its own data somewhere else.
We need a manager, this one will manage and assist all data transfer by giving order to collector and sender.
Last part, the big one and the more complex is the parser. If you have some raw data coming from outside world, you’ll probably want to route them based on some defined or undefined pattern. For a little amount of data, this task is not really hard, you can store it in memory temporarily, parse it and search for your information. This problem is becoming harder when you will have big data (e.g. more than 100MB) entering in your pipe! In this case, how to store it (your memory will not have place), how to parse it (if we don’t have it in memory, how to parse and search for some states) and finally send it to our sender?
I think you have the big picture, we have some raw data coming from anywhere, big sized, and need a way to find some specific pattern in it based on something we can’t store in memory.
I tried to read lot of papers and books about stream, lot of complex mathematics, I don’t really have the level to understand everything, so if you have mathematics experience and time to share, I will be happy to have an introduction about all this field!