Parsing Logentries output safely using Rust
I'm fascinated by Rust for it's safety and speed, but also because it's simple to write low level code in what feels like a high level language. To that end, I've been working on a small Rust project at TotallyMoney.com (where I work) for the last week or so to see if it's viable for production use. It's a simple service that polls a Logentries endpoint for JSON, parses it and saves some values in a Postgres database. It's not a very complicated task, but I saw this as a good opportunity to try Rust in a production-ish role.
Header photo by Sandra Starke
Note: I could very well have written this in NodeJS like the rest of the app it fits into, but I wanted to learn Rust a little better. The memory safety of Rust is somewhat lost on this task but it's interesting how the language handles errors and optional types. Read on for more.
I'm not very good at Rust (yet), so I may be writing terrible Rust code. Let me know @jam_waffles if I've made any glaring mistakes!
Dependencies
I'm assuming you've got Rust, Cargo and a project folder (cargo init --bin
will do) set up for
this project. I'm going to use the following crates:
hyper
; HTTP library for making GET requests to Logentrieschrono
; Date and time, we'll be using it to store the log entry timestamptime
; More date and time, used in this case for sleeping for a certain number of millisecondsrustc_serialize
; JSON parsing
My Cargo.toml
looks like this:
[]
= "logentries_poller"
= "0.1.0"
= ["James Waples <jamwaffles@gmail.com>"]
[]
= "^0.8.1"
= "^0.3.19"
= "^0.2"
= "^0.1"
Crates will be installed/updated when you run cargo build
or cargo run
for the first time.
Pretty neat!
Fetching Logentries data
First, we need to fetch some stuff from Logentries to parse. Logentries provides a simple GET endpoint which returns lines of a particular log. Their documentation specifies the URL as something like this:
https://pull.logentries.com/YOUR_LOGENTRIES_ACCOUNT_KEY/hosts/YOUR_LOG_SET_NAME/YOUR_LOG_NAME/?start=-10000
For our purposes we only want the last 10 seconds of data (?start=-10000
). The code will poll
Logentries every 5 seconds, so requesting the last 10 seconds worth of data will provide a crude
mechanism to deal with failures.
Let's write some Rust. Put this in src/main.rs
:
extern crate hyper;
use ;
Try running the code with cargo run
. With any luck, you should have the last 10 seconds worth of
Logentries submissions printed in your console. In my case, I see lines like this:
2016-04-26 15:18:23.1185 | lvl:INFO | ------------ src:HddRestGateway | { "message": { "status": "Success", "elapsed": 1089 } }
2016-04-26 15:17:53.9890 | lvl:INFO | ------------ src:HddRestGateway | { "message": { "status": "NotFound", "elapsed": 2443 } }
2016-04-26 15:17:56.8014 | lvl:INFO | ------------ src:HddRestGateway | { "message": { "status": "Success", "elapsed": 1375 } }
2016-04-26 15:17:38.9468 | lvl:INFO | ------------ src:HddRestGateway | { "message": { "status": "Timeout", "elapsed": 5897 } }
2016-04-26 15:18:38.6810 | lvl:INFO | ------------ src:HddRestGateway | { "message": { "status": "NotFound", "elapsed": 1100 } }
2016-04-26 15:17:52.6964 | lvl:INFO | ------------ src:HddRestGateway | { "message": { "status": "Success", "elapsed": 2246 } }
2016-04-26 15:18:14.7228 | lvl:INFO | ------------ src:HddRestGateway | { "message": { "status": "Success", "elapsed": 2987 } }
The rest of the code in this article assumes the data is in this format. It should be reasonably simple to change the code to suit your needs. Once we have the log output, it's just a matter of parsing strings however you see fit.
Filtering and parsing each line
Each line contains some fields separated by a pipe character, the first of which is a timestamp and
the latter of which is some JSON data. We're not interested in any of the other bits. We're also
only interested in log lines with { "status": "Success" }
, so we'll do some simple filtering after
parsing the line.
Our program now becomes the following:
extern crate hyper;
extern crate chrono;
extern crate time;
extern crate rustc_serialize;
use thread;
use env;
use Duration;
use ;
use Error;
use Json;
use ;
use ;
This is where Rust's safety starts to shine and look incredibly verbose at the same time. The code
above deals with a lot of Result<>
and Option<>
types, which return Ok()
or Err()
and
Some()
or None()
respectively. Rust checks at compile time to make sure we're matching every
possible return type, meaning we have to handle the Err()
or None()
cases in the code above.
Sometimes that means setting a sensible default (0
for elapsed time, for example) or continue
ing
if the code can't do anything proper with the current line. This code is going to run as a
persistent service so it makes sense to continue
or otherwise bail on processing the current line
instead of, say, panic!
ing and quitting the program.
Running the program again with cargo run
, you should see something like the following:
------ Status: Success, elapsed time: 1089ms at 2016-04-26 15:18:23.1185
Ignoring status NotFound
------ Status: Success, elapsed time: 1375ms at 2016-04-26 15:17:56.8014
Ignoring status Timeout
Ignoring status NotFound
------ Status: Success, elapsed time: 2246ms at 2016-04-26 15:17:52.6964
------ Status: Success, elapsed time: 2987ms at 2016-04-26 15:18:14.7228
Further explanation
Code comments aren't very good for learning a language by example, so I'll explain some of the code above.
First, we need to fetch some data from Logentries, in this case using the Hyper HTTP library.
// Create new Hyper HTTP client
let client = new;
// Make the request. This will quit if there is some kind of failure
let response = match client.get.send ;
Hyper's client.get(url).send()
reutrns a Result<>
which we're match
ing on. If the request
fails, it returns Result<Err>
which we're handling by panic!()
ing as there's no clean way to
recover from a failed request (ok so we could just try again using an exponential falloff but let's
keep things simple). If we get a Result<Ok>
, we return the HTTP response from the match
and
store it in response
for use later.
+++
Next, we turn the HTTP response into a BufReader
stream. This step isn't necessary but it might
make parsing a little faster if the HTTP response can be streamed, although I haven't looked into
it.
// Turn response into stream so we can parse it line by line
let reader = new;
+++
Let's get a timestamp. Logentries (at least in my case) outputs timestamps formatted in a
non-standard way, specifically in the milliseconds part. The last part of the stamp is 4 digits long
(tens of microseconds or something?) however Rust's %f
date format placeholder
expects 8 digits, so I'm going to add those on as a string literal along with a fake timezone.
If someone can point out a better way of parsing a timestamp like 2016-04-26 15:18:23.1185
using
DateTime
or the chrono
crate, please let me know.
// First item is timestamp. Turn it into a Chrono::DateTime
let created = match parts.first ;
Notice the nested match
statements in the code above. If everything goes well and we go down the
Some(created)
and Ok(parsed)
branches, that parsed date will be assigned to created
at the top
level.
Finally, if there's an error, the code will call continue
which skips the remaining iteration of
the loop and moves straight on to the next.
The rest of the code is similar to the above examples in terms of how match
is utilised.
Conclusions and next steps
Rust's strict type system caused me quite a bit of friction coming from a loosely-typed Node
ecosystem, but once you get the hang of it the compile time checking and comprehensive, required
error handling makes for very safe code. You'll notice above that only one variable is mut
able.
I'm not doing anything fancy with pointers so it doesn't add much for memory safety, but it allows
the compiler to make sure I'm not doing anything stupid reassigning variables and such.