Working with large files using Elixir

Here's an example of how you can use streams to read and manipulate very large files without busting your memory:

[input_file, output_file] = System.argv!(input_file)
|> Stream.reject(fn line ->
  case line do
    "INSERT INTO `logs`" <> _ -> true
    "INSERT INTO `states_changes`" <> _ -> true
    _anything -> false
|> fn line ->
  String.trim(line) <> "\n"
|> Stream.into(!(output_file) )

Let's break this down. On line 1 we pattern match on System.argv which returns a list of arguments that were passed in. We could use elixir's built in OptionParser but we don't need it for our purposes. It's usage will look like:

elixir trim_large.exs input_file.sql output_file.sql

Next we grab a stream from a file using!/3 and we use Stream.reject/2 to remove entire lines that we don't want. We then remove any leading or trailing whitespace with and String.trim/2.

We use Stream.into/2 to convert our working stream into an output file. And finally we usa to "execute" the stream. This is similar to calling Enum.to_list to return the enum but since it's a file we use a slightly different mechanism.

elixir streams

Elixir streams are essentially lazy-loaded Enums. You can do a lot of neat things with them.

Calling map an an Enum will execute that call in place, whereas a stream is composable. For example, you can chain a couple of maps, and selects together without actually executing anything until you want to. Here's an example:

a = 1..5

IO.puts "Eager enumeration:"

|> x -> IO.puts("x") end)
|> x -> IO.puts("o") end)

IO.puts ""
IO.puts "Now lazy enumeration:"

|> x -> IO.puts("x") end)
|> x -> IO.puts("o") end)
|> Enum.to_list

Running this in iex will produce:

Eager enumeration: x x x o o o Now lazy enumeration: x o x o x o

Enum prints all the x's first and then the o's, whereas the Stream will print x and then o for every iteration

You can also do neat things with Stream.unfold/2 like wrap generating random numbers with an enumerable interface:

random = Stream.unfold(nil, fn _ -> {:random.uniform(100), nil} end)

# Get 10 random numbers
values = Enum.take(random, 10)

# Get only even numbers
even_random = 
  Stream.filter(random, fn e -> rem(e, 2) == 0 end)
  |> Enum.take(10)

# Make a long random number
long_random = Enum.take(random, 10) |> Enum.join

We use Enum with a stream as an argument for when we actually want to return the values, and Stream when we want to keep composing the stream. Be careful not to call Enum.to_list/0 on this random stream as it will probably crash and never return.

Implementing a stack in Elixir PID vs GenServer

I wanted to see how I can essentially implement a GenServer in Elixir without actually using a GenSever. The following gist has two implementations, a basic GenServer implementation, and an implementation using send and receive.

how to push and set an upstream branch in git

I like to use hub, and mainly for creating pull requests in the console. To do so you have push and set an upstream branch: git push -u origin CURRENT_BRANCH_NAME. I found myself calling the remote branch the same as my local branch which became tedious so I created a shortcut: git pu.

Basic command
git rev-parse --abbrev-ref HEAD | xargs git push -u origin
Git alias
git config --global alias.pu \
'!sh -c "git rev-parse --abbrev-ref HEAD | xargs git push -u origin"'

Life can be much broader once you discover one simple fact: everything around that you call life, was made up by people no smarter than you.

-- Jobs