Getting Busy with Riak

Share Button

Welcome back to Database Diversity, where the queries are made up and the relationships don’t matter. Today we’re going to talk about a simple yet insanely powerful NoSQL Database called Riak. While its core functionality is a key/value datastore, Riak is capable of so much more: full text querying, MapReduce, and complex clusters. In our examples, we will install Riak on Oracle Enterprise Linux 6.3 and use it to store some data then retrieve and delete it. This basic operation is very easy; in fact, it would be boring to show you from the command line because it is well documented and involves some simple curl commands. Instead, we’re going to write, read, and delete via PL/SQL functions.

But first, let’s talk about what makes Riak so cool. Even if you don’t know how to use it, you should know the features; with 25% of the Fortune 50 using it, you never know when you might get asked to support it.

Riak the RESTful

Riak has a full HTTP 1.1 compliant API. This means that you can perform read, write, delete, and query operations using standard PUT (POST), GET, and DELETE operations. As such, curl is a great tool for working from the command line but any tool that can make HTTP calls will work.

Riak the Bountiful

On top of the standard HTTP API, it also has APIs for Java, PHP, Python, Ruby, JQuery, node.js, and C/C++. The language of choice is Erlang; after all, Riak was written in it. Between the RESTful API and other programming language support, Riak can be used from nearly anywhere.

Riak the Scalable

A huge selling point for Riak is that it is very scalable. Adding a node to a cluster is as simple as your first install, and doing so gives near linear performance on complex functions and with loads of concurrency. The data replication allows very robust fault tolerance (many failed nodes will still allow full access) and its distribution algorithms yield predictable results from even distribution.

Riak the Free

basho-riak Riak is open source, though the parent company Basho does offer paid options. Riak Enterprise is offered for multi-datacenter replication, monitoring, and support. For storage, Riak Cloud Storage offers an S3 API compatible storage engine with replication capabilities. If you don’t need these services, you can scale Riak to your heart’s content for nothing.

Installing Riak

Now that I’ve sung its praises, let’s get down to the install. I have a simple Oracle Enterprise Linux 6.3 x86_64 VM I spun up for the purpose using a standard server install. In order to install Riak, we must:

  • Install the expect library
  • Download riak 1.3 from AWS
  • Install riak via rpm
  • Set open file limit to 4096
  • Start Riak
[root@orahost ~]# yum install expect
...
Installed:
  expect.x86_64 0:5.44.1.15-4.el6

[root@orahost ~]# wget http://s3.amazonaws.com/downloads.basho.com/riak/1.3/1.3.0/rhel/6/riak-1.3.0-1.el6.x86_64.rpm
--2013-03-03 19:01:08--  http://s3.amazonaws.com/downloads.basho.com/riak/1.3/1.3.0/rhel/6/riak-1.3.0-1.el6.x86_64.rpm
Resolving s3.amazonaws.com... 72.21.195.97
Connecting to s3.amazonaws.com|72.21.195.97|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 23875120 (23M) [application/x-rpm]
Saving to: “riak-1.3.0-1.el6.x86_64.rpm”

100%[============================================================================================================================>] 23,875,120  6.64M/s   in 4.0s

2013-03-03 19:01:12 (5.64 MB/s) - “riak-1.3.0-1.el6.x86_64.rpm” saved [23875120/23875120]

[root@orahost ~]# rpm -Uvh riak-1.3.0-1.el6.x86_64.rpm
Preparing...                ########################################### [100%]
   1:riak                   ########################################### [100%]

root@orahost ~]# ulimit -n 4096
[root@orahost ~]# riak start
Attempting to restart script through sudo -H -u riak

And we are done and installed. At this point Riak is fully usable for reads and writes.

How Riak Storage Works

Riak data is stored inside of buckets, a flat namespace you can think of like a table. Buckets are unique in a Riak installation and can store oodles of key/value pairs. Keys are like primary keys, unique within a bucket. Values are where the magic happens: they can be anything. Strings, XML, JSON, binary, you name it. You can use it to store session data (login, expiry, preferences, etc), log files, backups, large XML files, whatever. The power to quickly store and retrieve data of any kind is where Riak shines.

Now, when an Oracle or other RDBMS person looks at this they may wonder why it’s so appealing; after all, you can make a table with a KEY, VALUE column combination and use that for the same purpose. But the strength of Riak is in its scalability and flexibility. It’s not meant to take the place of a relational database but it can be a great tool alongside one (or on its own).

Let’s PUT in Some Data!

In order to write to Riak, we write to a URL in the form of: http://hostname:8098/buckets/bucketname/keys/key (note 8098 is the default port). The body of the HTTP PUT request will be the data to store into the bucket with the key you specified. For instance, to write into a bucket called ‘logdata’ on localhost with a numeric key, you could use: http://localhost:8098/buckets/logdata/keys/1.

There are a few required/recommended parameters, but the ones we’ll concern ourselves with here are Content-Type and Content-Length. And to do it, we will use the UTL_HTTP package built into Oracle.

Before we can do that, we have to connect to Oracle as SYS and setup Access Control Lists (ACL):

BEGIN
  dbms_network_acl_admin.create_acl (
    acl          => 'riak.xml',
    description  => 'Permissions to access risk datastore',
    principal    => 'RIAK_USR',
    is_grant     => TRUE,
    privilege    => 'connect'
  );
end; 
/

begin
  dbms_network_acl_admin.assign_acl (
    acl => 'riak.xml',
    host => '127.0.0.1'
  );
end;
/

In the two PL/SQL blocks above, we are creating an ACL called riak.xml which gives the RIAK_USR user the ability to connect to host 127.0.0.1. Now that it’s done, we can run UTL_HTTP operations against localhost.

Here is the function I’ve created to do this:

create or replace function testput(pv_bucket in varchar2, pv_key in varchar2, pv_data in varchar2)
return varchar2
is
  lv_req utl_http.req;
  lv_rsp utl_http.resp;
  lv_out  varchar2(32767);
begin
  lv_req := utl_http.begin_request(url => 'http://127.0.0.1:8098/buckets/' || pv_bucket || '/keys/' || pv_key,
                                   method => 'PUT');
  utl_http.set_header(lv_req, 'Content-Type', 'text/plain');
  utl_http.set_header(lv_req, 'Content-Length', length(pv_data));
  utl_http.write_text(lv_req, pv_data);
  lv_rsp := utl_http.get_response(lv_req);
  lv_out := lv_rsp.status_code;
  utl_http.end_response(lv_rsp);

  return lv_out;

exception when utl_http.end_of_body then
  utl_http.end_response(lv_rsp);
  return lv_out;

end;
/

In this function, you can pass in a bucket name, key, and value. The function opens a PUT request, sets the headers, and writes the body with the utl_http.write_text procedure. The response that is returned is the status code from riak (204 is normal for inserts).

SQL> select testput('newbucket', '1', 'My test document') from dual;

TESTPUT('NEWBUCKET','1','MYTESTDOCUMENT')
--------------------------------------------------------------------------------
204

GET It?

Using a GET operation I can pull data by key, like with this function:

create or replace function testget(pv_bucket in varchar2, pv_key in varchar2)
return varchar2
is
  lv_req utl_http.req;
  lv_rsp utl_http.resp;
  lv_out  varchar2(32767);
begin
  lv_req := utl_http.begin_request(url => 'http://127.0.0.1:8098/buckets/' || pv_bucket || '/keys/' || pv_key,
                                   method => 'GET');
  lv_rsp := utl_http.get_response(lv_req);
  utl_http.read_text(lv_rsp, lv_out);
  utl_http.end_response(lv_rsp);

  return lv_out;

exception when utl_http.end_of_body then
  utl_http.end_response(lv_rsp);
  return lv_out;

end;
/

This function looks an awful lot like the PUT function. It retrieves data from the Riak database using a bucket and key with a GET operation. The body of the returned page is returned from the function.

SQL> select testget('newbucket', '1') from dual;

TESTGET('NEWBUCKET','1')
--------------------------------------------------------------------------------
My test document

And to make sure we cover it, let’s delete that row (and verify).

SQL> select testdelete('newbucket', '1') from dual;

TESTDELETE('NEWBUCKET','1')
--------------------------------------------------------------------------------
204

SQL> select testget('newbucket', '1') from dual;

TESTGET('NEWBUCKET','1')
--------------------------------------------------------------------------------
not found

If you guessed we did this with an HTTP DELETE operation, you’d be right:

create or replace function testdelete(pv_bucket in varchar2, pv_key in varchar2)
return varchar2
is
  lv_req utl_http.req;
  lv_rsp utl_http.resp;
  lv_out  varchar2(4000);
begin
  lv_req := utl_http.begin_request(url => 'http://127.0.0.1:8098/buckets/' || pv_bucket || '/keys/' || pv_key,
                                   method => 'DELETE');
  lv_rsp := utl_http.get_response(lv_req);
  lv_out := lv_rsp.status_code;
  utl_http.end_response(lv_rsp);

  return lv_out;

exception when utl_http.end_of_body then
  utl_http.end_response(lv_rsp);
  return lv_out;

end;
/

With those three functions, we can store (and update), retrieve, and delete any data we want. You can store VARCHAR2 and CLOB data, BLOBs as binary data, XML from DBMS_XMLGEN, JSON using the Alexandria Library, and really anything else you can think of. This data can be used for programming logic (like session data storage) or for later analysis with MapReduce.

And because they look so much alike, it would probably be really easy to put them in a nice Riak helper package. A project for another day, perhaps.

Conclusion

Just key/value storage to a free, highly distributed and redundant database is useful and impressive. But as I’ve said, there’s a lot more you can do with Riak. For instance:

  • Riak Search allows you to perform full text searches across the values of a bucket. This is an indexed search which is not quite as powerful as a MapReduce, but performs better and more predictably.
  • Secondary Indexes (2i) allow you to pick and choose what will be indexed and how. You can define tags for your data which allow you to search it on your terms; for instance, for a bucket full of emails, you can add tags for twitter and blog addresses, then search on those.
  • Links which allow one way relationships between objects that can be walked to form loose graph-like relationships.
  • MapReduce is the big kahuna. As with most systems, you can do a Map to gather a single set of data or use Reduce functions to further aggregate the info. Riak supports MapReduce functions in JavaScript and Erlang.

I plan on publishing some MapReduce examples in the near future, but for now I value my sanity and wish to go on pretending GROUP BY CUBE is the best aggregating mechanism since they fired up the hazelnut chocolate mixer at the Nutella plant.

Share Button

One comment

  1. Really Interesting Article thank you , was looking for ways of how to move data Oracle to riak. Any idea how this technique performs – I need tomove 2Tb of data to Oracle -> Riak.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.