Working with large files using elixir

Here's an example of how you can use streams to read and manipulate very large files without busting your memory:

[input_file, output_file] = System.argv

File.stream!(input_file)
|> Stream.reject(fn line ->
  case line do
    "INSERT INTO `logs`" <> _ -> true
    "INSERT INTO `states_changes`" <> _ -> true
    _anything -> false
  end
end)
|> Stream.map( fn line ->
  String.trim(line) <> "\n"
end)
|> Stream.into( File.stream!(output_file) )
|> Stream.run

Let's break this down. On line 1 we pattern match on System.argv which returns a list of arguments that were passed in. We could use elixir's built in OptionParser but we don't need it for our purposes. It's usage will look like:

elixir trim_large.exs input_file.sql output_file.sql

Next we grab a stream from a file using File.stream!/3 and we use Stream.reject/2 to remove entire lines that we don't want. We then remove any leading or trailing whitespace with Stream.map/2 and String.trim/2.

We use Stream.into/2 to convert our working stream into an output file. And finally we usa Stream.run to "execute" the stream. This is similar to calling Enum.to_list to return the enum but since it's a file we use a slightly different mechanism.

Elixir Streams

Elixir streams are essentially lazy-loaded Enums. You can do a lot of neat things with them.

Calling map an an Enum will execute that call in place, whereas a stream is composable. For example, you can chain a couple of maps, and selects together without actually executing anything until you want to. Here's an example:

a = 1..5

IO.puts "Eager enumeration:"

a
|> Enum.map(fn x -> IO.puts("x") end)
|> Enum.map(fn x -> IO.puts("o") end)

IO.puts ""
IO.puts "Now lazy enumeration:"

a
|> Stream.map(fn x -> IO.puts("x") end)
|> Stream.map(fn x -> IO.puts("o") end)
|> Enum.to_list

Running this in iex will produce:

Eager enumeration: x x x o o o Now lazy enumeration: x o x o x o

Enum prints all the x's first and then the o's, whereas the Stream will print x and then o for every iteration

You can also do neat things with Stream.unfold/2 like wrap generating random numbers with an enumerable interface:

random = Stream.unfold(nil, fn _ -> {:random.uniform(100), nil} end)

# Get 10 random numbers
values = Enum.take(random, 10)

# Get only even numbers
even_random = 
  Stream.filter(random, fn e -> rem(e, 2) == 0 end)
  |> Enum.take(10)

# Make a long random number
long_random = Enum.take(random, 10) |> Enum.join

We use Enum with a stream as an argument for when we actually want to return the values, and Stream when we want to keep composing the stream. Be careful not to call Enum.to_list/0 on this random stream as it will probably crash and never return.

Implementing a Stack in Elixir: GenServer vs. send receive

I wanted to see how I can essentially implement a GenServer in Elixir without actually using a GenSever. The following gist has two implementations, a basic GenServer implementation, and an implementation using send and receive.

it's been a while

It's been a year since I've posted anything to this blog.

I am bad and I should feel bad.

Life can be much broader once you discover one simple fact: everything around that you call life, was made up by people no smarter than you.

-- Jobs