Romancing the Confluent Platform 2.0 with Apache Kafka 0.9 & InfluxDB: A Simple Producer and Consumer Example

confluent

Thou Shall Publish…Thy Shall Subscribe…

For as long as there have been printing papers, there have been publishers and consumers. In ancient times the King’s scribes were the publishers; the pigeon, the courier or transport; and remote Lords of the Houses, the consumers or subscribers. In modern times, in the digital era, data is securely and reliably published and selectively subscribed. In other words, the publish/subscribe paradigm is not new; it’s old.

Today’s messaging systems such as Tibco, Java Messaging Service (JMS), RabbitMQ, Amazon SQS. etc are some examples of frameworks and platforms built on this paradigm for publishing and subscribing data and writing distributed streaming applications.

Add to that list a realtime streaming platform—and you get scalable, fault-tolerant and reliable messaging infrastructure with low-latency, allowing you to build and connect your disparate data sources for both realtime and batch applications quickly and easily. One such data streaming and messaging platform is Confluent Platform.

A key challenge in all these aforementioned messaging systems, including scale, reliability and security, is can they guarantee that the right data goes to the right place, within an acceptable latency?

Confluent founders declare that it’s their mission to “make sure data ends up in all the right places.”

I had a go at it to ascertain, as a developer, my “get-started” experience. For all platforms today, the “get-started” is your initial feel for what to expect; it’s imperative that your experience is positive; it’s unimpeded and unencumbered; it’s akin to your first date, if you will. And as you know first impressions do matter!

The Rule of Three

The central notion and litmus tests are simple: how easy it is for any developer to do the following:

  1. Download, Install and configure the platform
  2. Run it in local mode, not cluster mode
  3. Write with relative ease a first Hello World equivalent, in the supported SDK in language of choice

I abide by the moto: Less friction to development invariably leads to more (and rapid) adoption and acceptance.

While my goal on this first date is not to explore and embrace on all features of the platform, it is to write a simple, putative rendition of HelloWorld equivalent of Publish/Subscribe paradigm programming model, using the Confluent Platform 2.0 (CP), backed and supported by creators of Apache Kafka (0.9.0), originally developed at LinkedIn.

My examples are derived from two sources:

Later, I’ll implement an elaborate simulation of disparate data sources, large scale deployment of IoT devices all publishing data, in which I’ll employ CP as the messaging system as I did with PubNub previously.

For now let’s first crawl and have coffee with our first date before we run and have full course dinner…

Relevant Files

SimplePublisher.java (Producer)

IoTDevices.java (Helper)

As the name suggests, it’s a simple producer of few fake devices’ state information and publishes each device record to the CP topic “devices.” Three key takeaways. First, each topic to which you wish to publish a message, you must provide and register an Avro schema. For the duration of process (and even later) all producers publishing to this topic must adhere to this schema, which is registered and maintained in the Schema Registry.

Second, since by default CP uses Avro ser/der for the messages, you get the benefit of most default data types ser/der out-of-the box.

And finally, the Java client APIs are fairly easy (I have not tried other client implementations; but it’s worth exploring at least Scala, Python or RESTful). I’ll leave that as an exercise for other enthusiasts.

To get started, let’s compile and create a producer package. In the  publisher directory in this github repo, by creating the package:

  • $ mvn clean package

This will create the jar file in the target directory. Once created, you can follow the Steps To Run below to publish device records

Command Line Consumer

I get excited when I can learn, try and do something both programmatically and interactively. No surprise that Python and Scala Read, Evaluate, Print and Loop (REPLs) are such a huge hit with developers, because they allow a developer to prototype an idea quickly. No different are the UNIX shells.

Not since the creators of UNIX shells—Bourne, Csh, and Bash—are computer scientists and software engineers today so inspired to consider providing interactive shell as part of their platform environment. Look at PySpark and Scala spark-shell, where developers can interact and try or test code—and see the results instantly. Why wait for something to compile when all you want is to quickly prototype a function or an object class, test and tweak it.

Just as REPLs are developers’ delight, so are CLI tools and utilities that ship with a platform, allowing quick inspection, fostering rapid prototyping, and offering discrete blocks to build upon.

For example, in the CP’s distribution bin directory, I can use a number of command line scripts. One such utility allows you to inspect messages on a topic. This means, instead of writing a consumer yourself to do quick inspection, you can easily use one out-of-the-box. It comes handy, especially if you just finished publishing few messages on a topic and are curious to see if that worked.

To see what you just published on your topic, say devices, run this command:

$ kafka-avro-console-consumer --topic devices --zookeeper localhost:2181 --from-beginning

Another example, and equally useful, is the kafka-simple-consumer-shell, which allows you to interactively inspect your queues and topic partitions.

$ kafka-simple-consumer-shell --broker-list localhost:8081 --topic devices --partition 0 --max-messages 25

Java Subscriber (Consumer)

kafka_pub_sub.001

To use the Java consumer and insert devices messages into the time series database, run this command from within the ‘subscriber’ directory.

`$ mvn exec:java -Dexec.mainClass="com.dmatrix.iot.devices.SubscribeIoTDevices" -Dexec.args="localhost:2181 group devices 1 http://localhost:8081"`

Requirements

In order to try these examples you must download and insall the following on your laptop (Mine is a Mac)

Assuming that you have satisfied the above requirements, here is the order of steps to get a HelloWorld publisher and consumer going. Further, let’s assume you have CP installed in ${CONFLUENT_HOME}/bin and included in your $PATH.

  1. Start the zookeeper in one terminal.$cd ${CONFLUENT_HOME} && zookeeper-server-start ./etc/kafka/zookeeper.properties
  2. Launch the Apache Kafka Server in a separate terminal.
    $ cd ${CONFLUENT_HOME} && kafka-server-start ./etc/kafka/server.properties
  3. Start the Schema Registry in a separate terminal.$ cd ${CONFLUENT_HOME} && schema-registry-start ./etc/schema-registry/schema-registry.properties

In your publisher directory where you created the package execute the following command three commands:

  1. Compile the package

           mvn clean package

2. Execute the package class to produce 5 messages

             $ mvn exec:java -Dexec.mainClass="com.dmatrix.iot.devices.SimplePublisher" -Dexec.args="devices 5 http://localhost:8081"

This will publish five device messages to the topic ‘devices.’

3. Finally, to see the outcome of what you published, fetch the messages

       $ cd ${CONFLUENT_HOME} && kafka-avro-console-consumer --topic devices --zookeeper localhost:2181 --from-beginning

At this point, you should see all the messages received in the order published.

In short, the above recipe gets you started romancing with publish and subscribe paradigm using Confluent Platform 2.0. There’s much more to its power and potential in the distributed data processing of data streams at massive scale. You can read the history, the motivation, testimonials, use-cases, and social proof of this platform, both from developers’ and infrastructure point of view, if you peruse some of the resources below. The spectrum of large scale deployment in production speaks volume of Apache Kafka’s potential.

As I said earlier, what gets my passion flowing is the get-started positive experience. In general, in my opinion, any platform that takes inordinate amount of time to install, is difficult to configure and launch, and hard to learn to write your equivalent Hello World program is a negative sign of more friction and resistance and less desire of adoption and acceptance.

To that extent, Confluent met my initial, “get-started” needs, with minimal friction.

Try it for yourself.

Watch the Runs

Resources for Further Exploration

 

Leave a comment

Filed under Big Data, Distributed Computing, Programming

Apache Mesos: Is it Zero to One Innovation for The Data Center?

Data Center OS

In late 1999, Marc Andreessen, one of the co-founders of Netscape and Loudcloud (Opsware), declared to a handful of engineers huddled around a large table, standing distinct from all the discarded computer boxes, from jumbled and twisted wires connected to the rack of blinking servers that hosted our initial website:

We want to build the data operating system for the data center.

He further decreed: we want to manage, configure and monitor all systems in our data center.

At the time, the acronym IaaS was unknown. Provisioning tools such as Puppet, Chef, and Ansible were primordial fluids. A handful of us built his vision as our version of IaaS (the Way, The Truth, The Cog, The Nursery, and MyLoudlcoud) and offered, then, to the dot.coms of that era as a managed service—we provisioned, managed and monitored their entire stack: webservers, app servers, databases, and storage.

What Was Missing?

Marc’s vision was prescient then. What he didn’t mention was that we want to manage and orchestrate all of data center’s collective compute, disk and network resources; what he didn’t include was the desire to offer collection of nodes’ resources in the data center as a single entity—with no general distinctions, with no partial partitioning, with no labeling. That is, let’s treat all the servers as “herds of cattle,” not as distinct “named cats.”

Data Center As a Single Computer

That latter vision, the data center OS and the entire data center as a single computer, I see today realized and manifested in Mesosphere’s DCOS, built upon Apache Mesos. Its co-founder and creator Benjamin Hindman makes a persuasive case with resonating analogies of Mesos to the UNIX Kernel, its accompanying components Marathon and Chronos to UNIX’s init.d and cron respectively, its container isolation and protection to UNIX’s user spaces, its resource allocation to UNIX kernel’s scheduling, as though all cluster nodes were coalesced into a single computer. And so does David Greenberg in his talk about the evolution of Mesos.

What’s more, the most attractive feature for me as a developer (and having worked in the Ops world) is the ease with which you can develop distributed applications on Mesos, the ease with which you can deploy and dynamically scale your apps—whether you request the resources yourself (by writing a Mesos Framework scheduler) or you delegate and deploy them using Marathon’s REST APIs with JSON description (by packaging your apps into Docker container).

To me that idea may approximate Peter Thiel’s Zero-to-one innovation. It clearly differentiates: The innovative brilliance in any software or framework or language platform is its simplicity and clarity through abstraction, its unified SDK in languages endearing to developers.

Consider UNIX (it provided system API to its developers); take UNIX kernel (it offered SDK for myriad device-drivers plugins); look to Apache Spark (it has single, expressive API and SDK, to process data at scale); and review the Java and J2EE, what borne out of it. A SDK, like a programming language, has to have a “feel to it,” as James Gosling famously wrote almost two decades ago.

Does Mesos Have a “Feel to It”?

I think so. Let’s explore that by an example. Most of you at some point while learning a new language have dabbled with the putative “Hello World!” examples. In C, you wrote sprint (“%s”, “Hello World\n”); in Java, System.out.println (“Hello World!”); and in Python print “Hello World!”

In todays, complex and distributed world of computing, writing code, where junks of it are shipped to nodes in a cluster to perform a well-defined task, is not easy. Writing an equivalent distributed Hello World examples, such as Word Count or distributed command executor or crawler, takes sometime, some knowledge, some confidence to overcome the daunting task of distributed programing.

But some frameworks inherently have an easy way and feel to writing distributed apps. Hadoop’s MapReduce, for instance, is not one of them (though it has its purpose and role—and does a phenomenal job for large-scale batch processing). Apache Spark has an easy feel to it, so is Apache Mesos: the APIs and SDK make it so.

Mesos’ Hello World!

Suppose if I wanted to write a simple distributed app that simply executes a pre-defined or specified command on my cluster. I could iterate over all the hosts using ssh; I could also create a crontab on every host.

But the scheme does not scale, when I have hundreds of nodes and when some nodes are out of circulation due to upgrades.

One option, you write a distributed command scheduler using Mesos Framework SDK. Second option, you employ Marathon framework and use its scheduler to execute desired command on the cluster. And the third option, you create a Cronos entry for the desired command, and let each of the elected option’s frameworks handle orchestration via Mesos.

Option 1: Writing a Mesos Framework Scheduler

As a developer of a Mesos framework scheduler, you implement two interfaces (or principal classes) and override its methods as required. Together, they coordinate all the scheduling and executing tasks.

In the Scheduler Interface, you must at least implement the following methods:

  • public void resourceOffers(…);
  • public void statusUpdate(…);
  • public void frameworkMessage(…);
  • public void registered(…);

For the Executor Interface, you should at most implement three methods, and others as needed:

  • public void launchTask(…);
  • public void registered(…);
  • public void frameworkMessage(…);

Two-way Interaction between the Scheduler and Executor

Implemented and built on the principles of a UNIX kernel, Mesos abstracts all the compute resources of the cluster—CPUs, Memory, Disk, Ports—and orchestrates a two-way scheduling through messages, events, resource offers, and status updates.

This two-way scheduling provides a level of abstraction or indirection between the scheduler framework, Mesos, and the slave nodes. There are two reasons for this approach. First, the level of indirection provides common functionality every new distributed systems re-implements, such as failure detection, task distribution, and task life cycle. By providing an API that leverages this functionality, developers don’t have to re-implement—it’s taken care of. Second, this two-level abstraction enables running multiple distributed systems’ tasks on the same cluster of nodes and dynamically sharing resources—it enables multi-tenancy and resource optimization.

Using protocol buffers, the slave nodes communicate in this two-way manner to Mesos Master, which then passes messages up to the Framework scheduler. Likewise, the framework scheduler sends launch requests for the resources accepted to the Mesos Master, which then delegates them to the slave nodes, where actual tasks are performed.

Command Sampler and DNS Crawler

I have implemented these classes, and you can examine them on my github. The README explains the steps to spin up Mesos master and slave nodes, with Marathon, Cronos, and Mesos installed and configured. You will need a Virtual Box, though.

Also, I have an another example of a framework scheduler with two executors called mesos-dns-crawler.

(But first, I urge that you watch the aforementioned YouTube presentation links from Ben and David: they will provide a conceptual context.)

The diagram below depicts a high-level view of how framework schedulers’ implemented classes interact: available resources are offered from the slave nodes to the framework via the Mesos Master; accepted offers are sent to slave nodes via Mesos master as launch requests for tasks; and status updates and messages percolate back to the scheduler framework.

mesos_1

Option 2: Deploying a Distributed Application in Marathon

Marathon is a Mesos framework, and it’s analogous to cluster-wide init.d for any application or service deployed via this framework. Since it’s implemented as scheduler, it orchestrates all tasks. There are two ways to deploy an application or service on Marathon. One is to use the REST API, and another is its easy-to-use GUI.

In above example, we need a JSON description of our command sampler:

Screen Shot 2015-09-25 at 1.11.46 PM

Next, with Marathon running on your cluster, you can use the REST endpoint to deploy on the cluster. At any point you can scale up or down the number of instances. For example,

Screen Shot 2015-09-25 at 1.13.43 PM

will deploy four instances of “Command Sampler” on nodes in the cluster.

Similarly, you can deploy the task using the Marathon GUI. The screen shots show a) configuring the service and b) scaling the instances deployed.

mesos_2

mesos_3

Option 3: Delegating in Cronos

Finally, you can use the third option to accomplish the same task with Chronos. Using its simple GUI, you can configure a command or an executable you want executed as a task at periodic intervals on each node in the cluster.

mesos_4

Conclusion

So far I’ve shared three distinct ways to deploy a distributed application on Apache Mesos. All three ways use the resources of a cluster, either by an explicit framework scheduler, by Marathon’s framework cluster-wide scheduler, and by Chronos’ framework scheduler.

From these simple yet illustrative examples, you can see the ease with each you can deploy any distributed application in the cluster in more ways than one, whether an application is a simple command or a complex crawler or data processor engine with multiple executor tasks. You can “feel” its potential and power.

Such simplicity at scale for managing thousands of nodes’ compute resources in the cluster at data center or in the cloud as a single logical computer, such ease of programmability and flexibility at distributed application developers’ and devops’ disposal may approach innovation that takes us from Zero to One!

Resources

In this blog, I did not explore the failover, resiliency or high-availability due to failures on part of Executors or Frameworks or Mesos Master. Nor did I comment on optimization or reservation of cluster resources, but the links below expound many aspects of it. I encourage that you peruse them.

Leave a comment

Filed under Distributed Computing, Programming, Technology

4 Easy Building Blocks for a Big Data Application – Part II

by

Jules S. Damji

Few years ago, I was binging on TED Talks when I stumbled upon John Maeda’s talk “Designing for Simplicity.” An artist, a technologist, and an advocate for design in simplicity, he wrote ten laws that govern simplicity. I embraced those laws; I changed my e-mail signature to “The Best Ideas Are Simple”; and my business cards echo the same motto.  And to some extent, the Continuuity Reactor’s four building blocks adhere to at least  six out of the ten laws of simplicity.

Screen Shot 2014-03-13 at 4.35.53 PM

In the last blog, I discussed the four simple building blocks for a big data application and what their equivalent operations are on big data—collect, process, store, and query. In this blog, I dive deep into how, using Continuuity Reactor SDK, I implement a big data application. The table below shows the equivalency between big data operations and Continuuity Reactor’s building blocks.

Operations as Logical Building Blocks

Screen Shot 2014-03-13 at 1.45.21 PM

But first, let’s explore the problem we want to solve, and then use the building blocks to build an application. For illustration, I’ve curtailed the problem to a small data set; however, the application can equally scale to handle large data sets from live streams or log files, and with increased frequency.

Problem

A mobile app wants to display the minimum and maximum temperature of a city in California for a given day. Your backend infrastructure captures live temperature readings every 1/2 hour from all the cities in California. For this blog, we limit the temperature readings for only seven days. (In real life, this could be for all the cities, in all countries, around the world, everyday, every year—that is lots of data. Additionally, weekly, monthly, and yearly averages and  trends could be calculated in real time or batch mode.) The captured data are stored in and read from log files. However, it could easily be read or ingested from a live endpoint, such as http://www.weatherchannel.com.

The TemperatureApp big data application uses Streams to ingest data, transforms data in the Flow & Flowlets, stores data into Datasets, and responds mobile queries in the Procedures. Below is the application architecture depicting all the various building blocks interconnected.

Screen Shot 2014-03-12 at 1.07.08 PM

Four Building Blocks = Unified Big Data Application

As I indicated in the previous blog, the application unifies all the four building blocks. It’s the mother of everything—it’s the glue that defines, specifies, configures, and binds all the four building blocks into a single unit of execution.

In one Java file, you easily and seamlessly define the main application. In my case, I’ll code our big data main app into a single file TemperatureApp.java. You simply implement a Java interface Application and its configure() method. For example, in this listing I implement the interface.

Screen Shot 2014-03-12 at 4.12.21 PM

Ten Java lines of specification and definition code in the configure() method pretty much defines and configures the building blocks of this big data app as a single unit of execution—a builder pattern and its intuitive methods make it simple!

Building Block 1: Streams

Think of Streams as entities that allow you to ingest large, raw datasets into your system, in real time or batch mode. With the Java API, you can define Streams easily. Just create them with a single API Call.

import com.continuuity.api.data.stream.Stream;Stream logStream = new Stream(“logStream”);Stream sensorStream = new Stream(“sensorStream”);

Another way to define, specify, and configure Streams is within the main application, as shown above. An argument (or a name) to the Stream constructor uniquely defines a Stream within an application. Associated with Streams are events, which are generated as data are ingested into a Stream; Stream Events are then consumed by Flowlets within a Flow. To ensure guaranteed delivery to its consumers, namely Flowlets, these events are queued and persisted.

Building Block 2: Processors

Think of Flows & Flowlets as a Directed Acyclic Graph (DAG), where each node in the DAG is an individual Flowlet, the logic that will process, access, and store data into Datasets.

In our TemperatureApp.java above, I created a single Flow instance called RawFileFlow().Within this instance, I created two Flowlets—RawFileFlowlet() and RawTemperatureDataFlowlet()and connected them to Streams. The code below shows how a Flow is implemented, how Flowlets are created, and how Streams are connected to Flowlets. And with just another ten lines of easy to read Java code in the configure() method, I defined another building block—again that is simple and intuitive!

For brevity, I have included only the RawFileFlowlet() source. You can download the TemperatureApp sources from the github and observe how both Flowlets are implemented. Also, note below in the source listing how Java @annotations are used to indicate what underlying building blocks and resources are accessed and used.

Screen Shot 2014-03-13 at 2.20.10 PM

Building Block 3: Datasets

For seamless access to the underlying Hadoop ecosystems’ storage facilities, Datasets provide high-level abstractions to various types of tables. The relevant tables’ Java API provides high-level read-write-update operations, without your knowledge of how data is stored or where it’s distributed in the cluster. You simply read, write, and modify datasets with simple access methods. All the heavy weight lifting is delegated to the Continuuity Reactor application platform, which in turn interacts with an underlying Hadoop ecosystem.

For example, in this application, I use a KeyValueTable dataset, which is implemented as an HBase hash map table on top of HDFS. The Reactor Java API abstracts and shields all underlying complex HBase and HDFS interactions from you. You simply create a table, read, write, and update datasets, using the instance methods. In the above flowlets, RawFileFlowlet() and RawTemperatureDataFlowlet(), I use these high-level operations to store data. Additionally, these flowlets could also store temperature data in a TimeSeriesTable so that daily, weekly, monthly, and yearly average temperatures could be calculated for any city anywhere in the world at anytime.

Building Block 4: Procedures

Procedures allow external sources, such as mobile apps or web tools, to access datasets with synchronous calls. Most Procedures usually access transformed data, after they have gone through the Flow. A Procedure defines a method with arguments: a method name and list of arguments. These arguments are conveyed within a ProcedureRequest and the response to the query contained within a ProcedureResponse parameters respectively.

In our app, a mobile app sends the method name (getTemperature), a day of the week, and a city as procedural arguments. Through the ProcedureResponse, it returns the response or an error message.

Just like all other components, Procedures use Java @annotations to access underlying building blocks and resources. The listing below shows how our Procedure is implemented. And as with other building blocks we implemented above, Procedures can be instantiated as multiple instances, where each instance can run anywhere on the cluster—a simple way to scale!

Screen Shot 2014-03-13 at 2.22.09 PM

What Now?

Well, first, in order for you to try this app, you’ll need to do the following:

  1. Download the Continuuity Reactor and SDK from here.
  2. Follow instructions and install the local Reactor on your laptop.
  3. Read the Developer’s guide and documentation.
  4. Try and deploy some examples. (The newer version has excellent real life examples.)
  5. Download TemperatureApp from github.
  6. cd to <git directory>/examples/java/temperatures
  7. ant

Second,

  1. Follow the instructions and guidelines in the Continuuity Reactor Quickstart guide for deploying apps.
  2. Deploy the TemperatureApp.jar into the local Reactor.
  3. Screen Shot 2014-03-13 at 11.17.41 AM
  4. Inject the directory data  path to the temperature data files
  5. Screen Shot 2014-03-13 at 11.13.00 AM
  6. Run or start the Flow
  7. Screen Shot 2014-03-13 at 10.59.55 AM
  8. Make a query
  9. Screen Shot 2014-03-13 at 10.59.26 AM

What’s Next?

Hadoop 2.x ecosystem, with the YARN framework, from HortonWorks and Cloudera, has revolutionized and simplified building distributed applications. One is no longer limited only to MapReduce framework and paradigm. One can write any type of a distributed application taking advantage of the underlying Hadoop ecosystem.

I’ll explore some fine features in the next series.

As John Mead said that “Simplicity is about living life with more enjoyment and less pain,” I enjoyed programming on the  Continuuity Reactor platform with more enjoyment and less pain or frustration—and I hope that you too if you take it for a spin.

Resources

http://www.continuuity.com

http://www.continuuity.com/developers/

http://ubm.io/1jSBiuS

http://www.continuuity.com/developers/programming

Leave a comment

Filed under Big Data, Programming, Technology

Apache Spark on Hadoop: Learn, Try and Do

Not a day passes when someone tweets or re-tweets a blog on the virtues of Apache Spark. Not a week passes when an analyst either examines the implications or hype of Apache Spark in the big data landscape. And not a month passes when one fails to hear an expressive effusion among advocate’s spoken words about Apache Spark at Meetups.

At a Memorial Day BBQ, an old friend proclaimed:

Spark is the new rub, just as Java was two decades ago. It’s a developers’ delight.

Spark as a distributed data processing and computing platform offers much of what developers’ desire and delight—and much more. To the ETL application developer Spark offers expressive APIs for transforming data and creating data pipelines; to the data scientists it offers machine libraries; and to data analyst it offers SQL capabilities for queries.

In this blog, I summarize how you can get started, enjoy Spark’s delight, and commence on a quick journey to Learn, Try, and Do Spark on Open Enterprise Hadoop, with a set of tutorials. But first a peek to the past…

Look to the past

Incidentally, 20 years ago, Java’s Dancing Duke turned 20. I could not help but reflect similar effusions among advocates and similar skepticism among analysts. Then I was working at Sun Microsystems during Java’s formative stages.

duke

The delight among Java developers to engage with expressive and extensible Java APIs—for utilities, data structures, threading, networking, and IO—was palpable; the Javadocs were refreshing; and the buzz infectious.

Because Java language specification for a Java Virtual Machine (JVM) and the Java APIs abstract the lower-level operating system on a targeted platform, Java developers worry less about complexity of execution on the targeted platform and concentrate more on how they operate on and transform data structures through concrete classes and access methods.

Similarly,  big data developers seem to embrace Spark with equal passion and verve. They enjoy its expressive APIs; they like its functional programming capabilities; they delight in its simplicity—on concepts such as RDDs, transformations, and actions; on additional components that run atop the Spark Core. They expend less energy on the low-level Hadoop complexity and spend more on high-level data transformation, through iterative operations on datasets expressed as RDDs.

Apples and Oranges?

True, while Java is a programming language and Spark is a distributed data processing and computing engine, I could be accused of comparing apples and oranges. But I’m not comparing functionality per se. Nor am I comparing core capabilities or design principles in particular. Java is an extensible language for writing web, enterprise applications and distributed systems, whereas Spark is an extensible ecosystem for distributed data processing.

Rather, I’m comparing their allure and attraction among developers; I’m underscoring what motivates developers’ desire and delight (or inflames their resistance and rebuke) to embrace a language or platform: it is ease of use; ease of development and deployment; unified APIs or SDKs, in languages endearing and familiar to them.

Spark offers that—and much more. With each rapid release, new features accrue. Much of the road map we will hear at the Spark Summit 2015 this week.

In the present

Spark on Apache Hadoop YARN enables deep integration with Hadoop and allows developers and data scientists two modes of development and deployment on Hortonworks Data Platform (HDP).

In the local mode—running on a single node, on an HDP Sandbox—you can get started using a set of tutorials put together by my colleagues Saptak Sen and Ram Sriharsha.

  1. Hands on Tour with Apache Spark in Five Minutes. Besides introducing basic Apache Spark concepts, this tutorial demonstrates how to use Spark shell with Python. Often, simplicity does not preclude profundity. In this simple example, a lot is happening behind the scenes and under the hood but it’s hidden from the developer using an interactive Spark shell. If you are a Python developer and have used Python shell, you’ll appreciate the interactive PySpark shell.
  2. Interacting with Data on HDP using Scala and Apache Spark. Building on the concepts introduced in the first tutorial, this tutorial explores how to use Spark with a Scala shell to read data from an HDFS file, perform in-memory transformations and iterations on a RDD, iterate over results, and then display them inside the shell.
  3. Using Apache Hive with ORC from Apache Spark. While the last two tutorials explore reading data from HDFS and computing in-memory, this tutorial shows how to persist data as Apache Hive tables in ORC format and how to use SchemaRDD and Dataframes. Additionally, it shows how to query Hive tables using Spark SQL.
  4. Introduction to Data Science and Apache Spark. Data scientists use data exploration and visualization to confirm a hypothesis. They use machine learning to derive insights. As first part of a series, this introductory tutorial shows how to build, configure and use Apache Zeppelin and Spark on an HDP cluster.

 Our commitment to Apache Spark is to ensure it’s YARN-enabled and enterprise-ready with security, governance, and operations, allowing deep integration with Hadoop and other YARN enabled workloads in the enterprise—all running under the same Hadoop cluster, all accessing the same dataset. At the Spark Summit 2015 today, Arun Murthy, co-founder of Hortonworks, summed up why Hortonworks loves Spark.

Conclusion

As James Gosling put it 20 years ago in The Feel of Java:

By and large, it [Java] feels like you can just sit down and write code.

I feel, and presumably many others do so with Spark, that you can just sit down, fire up a REPL (Scala or PySpark), prototype, interact, experiment, and visualize results quickly.

Spark has that feel to it; its APIs have that expressive and extensible nature; its abstraction and concepts eclipse the myriad complexities of underlying execution details. Far more importantly, it’s a developers’ delight.

Learn More

Leave a comment

Filed under hadoop, spark

4 Easy Building Blocks for a Big Data Application – Part I

by

Jules S. Damji

Engineering Mandate in the Old Hadoop World

Two years ago, before I moved to Washington DC, to pursue a postgraduate degree in Strategic Communication at Johns Hopkins University, I worked for a publishing software company, Ebrary (now part of ProQuest), as a principal software engineer. We digitized books for academic libraries, provided a platform as a service to university libraries so students and faculty could borrow, read, and annotate books on their mobile devices or laptops.

My director of engineering called an engineering strategy meeting. The small group of five senior engineers huddled around a small table in a fish bowl conference room.

Animated, he said, “We have a new mandate, folks. We need to scale our book digitization by ten-fold. We need to parallelize our digitization efforts so that we can digitize not 50 books a week, but five hundred or five thousand. As many as they [publishers] send us. We must revamp our infrastructure and adopt Hadoop clusters! ”

And, of course, adapt to a distributed, scalable, and reliable programming paradigm—do all the digitization in parallel, as MapReduce jobs, on a 50-node cluster.

The effort to install, administer, monitor, and manage an Apache Hadoop cluster, back then, was not only a herculean task, but a stupendous, steep learning curve. None of us had the operational experience; none of us, even remotely, had the MapReduce parallel programing experience, like the folks at Yahoo or Facebook or Google who had pioneered and heralded an enormously powerful and highly scalable programing model that accounted for distributed storage, distributed computation, distributed large data sets, and distributed scheduling and coordination. Our discovery process was disruptive, instructive, and illuminative.

We abandoned our efforts and settled for managed AWS and EC as our underlying infrastructure. Not because the Apache Hadoop ecosystem did not live up to its promise. (In fact, other companies with large engineering staff, such as Netflix, were the show case and social proof of Hadoop’s promise.) But because it was too complex to manage, and we were short of operational experience.

The New World of Hadoop and Big Data

But I wish I had to relive and retackle the same problem today. After two years away, I’m back in the Silicon Valley. And the world of Hadoop ecosystem has moved and improved at a lightning pace. Now we have “Big Data” companies that offer Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Cloud as a Service (CaaS). We have companies that have become the Red Hats of the Hadoop ecosystem. To name a notable few, Cloudera, Amazon, RacskSpace, MapR, HortonWorks etc—they have mitigated the pain (of you as a developer, an administrator, a dev-ops engineer, or a startup founder), burden, and headache of managing an Hadoop cluster and infrastructure; they have innovated additional components that complement and render the Hadoop ecosystem as a powerful distributed execution environment to handle, store, compute, aggregate, and process large datasets. In short, they have invented an Aspirin for the IT administrators’ headaches.

My former big boss at LoudCloud (Opsware) Marc Andressen once said at our all hands meeting, “We want to be the operating system of the Data Center.” In a similar way, these Big Data infrastructure companies have successfully become the operating systems and execution environments of a highly complex, immensely powerful, and incredibly fault-tolerant of a cluster of thousands of nodes.

So what’s the X-factor in the New Big Data World?

Yet one thing is missing: A developer-focused application platform that shields the complexity of the underlying Hadoop ecosystem, that offers an intuitive application programing interface as building blocks to distributed tasks, and that allows a developer to quickly and easily develop, deploy, and debug big data applications. What’s missing is a big data application server on which big data applications are deployed as a single unit of execution—a jar file, analogous to a war file in a J2EE application server.

Until now.

The new big data startup company Continuuity, Inc satisfies all the above desirable features. Its Reactor application platform shields the underlying Hadoop ecosystem. It runs as middleware on top of an existing Hadoop ecosystem. Its Reactor Java SDK abstracts complex big data operations as programmable building blocks. And its local Reactor allows developers to quickly develop, debug, and deploy their big data apps as jar files.

It’s the X-factor in the new Big Data world, offering developer tools to build big data applications. The Continuuity CEO and cofounder Jonathan Gray seems to think so: “We’ve built middleware, or an application server, on top of Hadoop.”

“We’re a Hadoop-focused company. What makes us different from everyone else is that we’re a developer-focused company,” said Gray.

The Big Data Operations and the Four Reactor Building Blocks

Invariably, engineers perform four distinct operations on data, traditionally known as extract, transform, and load (ETL). The engineers collect, process, store, and query data. Most programmers implement these ETL operations as batch jobs, which run overnight and massage the data. But that is no different today, except the scale is enormous and the need is immediate (in real time).

The Continuuity Reactor SDK offers these operations as programmable building blocks. They translate easily into comprehensive Java interfaces and high-level Java abstract classes, allowing the developer to extend them, implement new methods or override existing ones. These sets of classes can be used collectively to implement the traditional ETL operations as scalable, distributed, and real-time tasks, without you as a Java developer having the wherewithal of the underlying Hadoop ecosystem.

The goal is to empower the developer and to accelerate rapid development. A programmer focuses more on developing big data apps and less on administering the cluster of nodes. While the Continuuity application platform, interacting with the underlying Hadoop ecosystem, takes care of all the resource allocation and life-cycle execution, the software engineer focuses on programming the ETL operations on large data sets.

The table below shows the equivalency between big data operations and Reactor’s building blocks.

Operations as Logical Building Blocks

Screen Shot 2014-03-07 at 4.36.19 PM

Building Block 1: Streams

Think of Streams as entities that allow you to ingest large, raw datasets into your system, in real time or batch mode. With the Java API, you can define Streams. With the REST API, you can send data to a Stream. And Streams are then attached to another building block called Processors, which transform data.

Building Block 2: Processors

Processors are logical components composed of Flows, which comprise of Flowlets. A Flow represents a journey of how your ingested data are transformed within an application. Think of Flows & Flowlets as a Directed Acyclic Graph (DAG), where each node in the DAG is an individual Flowlet, the code and logic that will process, access, and store data into Datasets.

Building Block 3: Datasets

For seamless access to the underlying Hadoop ecosystems’ storage facilities, Datasets provide high-level abstractions to various types of tables. The relevant tables’ Java API provides high-level read-write-update operations, without your knowledge of how data is stored or where it’s distributed in the cluster. You simply read, write, and modify datasets with simple access methods. All the heavy weight lifting is delegated to the Continuuity Reactor application platform, which in turn interacts with an underlying Hadoop ecosystem.

Building Block 4: Procedures

Naturally, after you have transformed and stored your data, you will want to query it from external sources. Which is where Procedures, through its Java interfaces and classes, provide you the capability to query your data, to scale your queries over multiple instances of Procedures, and to gather datasets and combine them into a single dataset. Procedures, like Flowlets, can access Datasets, using the same access methods.

Four Building Blocks = Unified Big Data Application

It’s the mother of everything—it’s the glue that configures, binds, and combines all the four building blocks into a single unit of execution.

Conceptually, an anatomy of a big data application from the Continuuity Reactor’s point of view is simple—it’s made up of the four building blocks, implemented as Java classes and deployed as a single unit of execution, a jar file. And this notion of simplicity pervades and prevails throughout the Reactor’s Java classes and interfaces.

An Anatomy of a Big Data Application

blog_1_diag_1

The digram above depicts a logical composite of a Reactor big data application. Data are injected from external sources into Streams, which are then connected to Processors—made up of Flows & Flowlets—which access Datasets. Procedures satisfy external queries. Note that each building block, once implemented as Java class objects, can have multiple instances, and each instance can then run on a different node in the cluster offering massive scalability.

So What Now?

In this blog, I introduced high-level concepts pertaining to Continuuity Reactor (in particular) and relating them to big data operations (in general). In the next blog, I’ll explore the mechanics of writing a big data application using the four Continuuity Reactor building blocks with their associated Java interfaces and classes.

Meanwhile, if you’re curious, peruse the Continuuity developer website, download the SDK, deploy the sample apps, read the well-written developer’s guide, and feel the joy of its simplicity, without worrying about the underlying Hadoop complexity.

It may surprise you, even delight you, as it did me!

Resources

  1. http://www.continuuity.com
  2. http://www.continuuity.com/developers/
  3. http://ubm.io/1jSBiuS
  4. http://www.continuuity.com/developers/programming

1 Comment

Filed under Big Data, Programming, Technology

How Cool Geeks Make a Difference in the World

If you’re a socially conscious nerd, cool, and chic, you can make a difference in the world, wrote the Economist. Indeed, being geek is chic, especially in the Silicon Valley, where it’s a norm among the young aspiring next Mark Zuckerbergs, Marissa Mayers, or Jessica Jackleys of the world. But that trend is not unique to the Silicon Valley. Here in DC, the crucible of NGOs and non-profits, social entrepreneurship, blending innovative technology with social change, is an emerging trend—with results to prove its efficacy and effect.

The Uber Geek

One such cool über geek is Nick Martin, the co-founder and president of TechChange, a tech startup company that offers online technology training for social change.  A graduate of Swarthmore College and The University for Peace, Nick’s clarion call to social change began early on when he first started an award-wining conflict resolution and technology program for the elementary schools called DCPEACE.

“I knew I wanted a career in social change. I felt strongly it should involve tech in some way. I believed education was the key to everything,” said Nick.

This desire for social change, using innovative technology and education, was the genesis of TechChange. Its primary goal has been simple: offer online technology courses to people, both locally and globally, so they are empowered to employ technical skills gained in their field.

“We’ve got a nice niche focusing on online learning for international development and helping organizations deliver their content,” said Nick.

To that end, Nick advocates for geeks who make a difference in the world, teaches at many prestigious DC institutions, blogs, and travels regularly to Africa.

nick1

Nick teaching a course at Amani Institute and iHub in Nairobi.

Nick-2

Nick at the 1776 TechCockail Startups.

You as the Uber Geek

But you don’t have to be Mark, Marissa, Jessica, or Nick to make a difference in the world. You can be an ordinary Joe or Jane, a graduate or an undergraduate student, a working professional or retired. And if you’re interested in working in the developing world, in the areas of conflict management, health provisioning, governance, and aid management, you can do so—by using the platforms these entrepreneurs have created and by following the advice they’ve imparted. Nick’s advice and insight for students at TechChange are:

  • What you put into the courses is what you’ll get out of them.
  • Though designed for powerful, engaging, and interactive learning experience, you still have to be self-directed and motivated.
  • Online learning should be social, so share as much about your best practices with others in your class as you can. In a course like mHealth, you may be sharing your best practice with a health worker in Uganda or a doctor in Argentina.
  • Development agencies want employees with tangible technological skills who can transition outdated processes and update them with automated ones.

Other Uber Geeks

Some students at TechChange are development professionals working for international aid agencies. Some are recent graduates. Some are tech savvy. Some are completely new to technology but with an abundance of passion. In other words, they are all cool geeks who want to make a difference. One student, Trevor, who enrolled in several courses at TechChange made such a difference. He shares his journey.

“I found three key features of TC105 very valuable to me: the relevant information, the interactive experience, and the access to a network of experts in mobile tech,” said Trevor.

Uber Social Entrepreneurs

For those who aspire to go beyond and become social entrepreneurs, Nick advices:

  • You can’t do it alone. Find a team with complementary skills who feel as passionate about social change as you do. As Harvard Business Review writes, “social entrepreneurs alone cannot change the world [or make a difference]. They need artists, volunteers, development directors, communications specialists, donors, and advocates across all sectors to turn their groundbreaking ideas into reality.”
  • Everything will take twice as long and cost twice as much, so plan accordingly.
  • Fail rapidly, learn quickly. Get the product out first. And constantly improve.
  • Do a few things well, and see how the market responds.

As one study claims, “The clarion call for just sustainable development is being taken up all over the world by individuals, non-profit organizations, community groups, schools, businesses, governments, and other organizations. These groups are developing appropriate technologies, small businesses, and products to aid the creation of a just sustainable world.” In other words, they’re making a difference; they’re making the change.

So if your clarion call is to make a difference in the world, like Trevor, or to become a social entrepreneur, like Nick, the choices are simple.

Be a Cool Geek. Be Chic. Be the Change.

by

Jules S. Damji

Leave a comment

Filed under Technology

Traits of a Good Spokesperson

Friedman (2002) recalls her encounter with Dr. Ross when she took her nine-week-old sick son to him. She talked at length to the doctor and then asked if he wanted to examine the baby.

“Of course. But even before I examine him, I know what’s wrong,” Dr. Ross smiled and said.

She wondered how was that possible.

“By listening to mothers, I usually learn most of what I need to know,” Dr. Ross said.

Spokespersons’ Desirable Qualities

Good spokespersons, Friedman (2002) asserts, listen intently, before they speak. They know and understand their audience, including the media. They’re credible, accessible, knowledgeable, ethical, and likable. They use simple language, free of jargon. They are advocates who are well prepared, know what they want to say, how, and when. They understand their clients’ (or principals’) interests. They’re audience-centric, media-centric, not ego-centric.  And they answer all questions truthfully and factually to the extent they can, without spinning, without selling.

These characteristics are an embodiment of a good spokesperson. In a corporate and political world, and in a media-driven society, perception becomes reality (Jones, 2005). As spokespeople, they represent the voice and perception of an organization to the world, and the media engenders that perception through the spokesperson.

The Role of a Spokesperson

Sometimes a spokesperson is part of an organization’s communication team. Sometimes a spokesperson is a CEO. Sometimes there are multiple spokespersons, each representing an area of their expertise. And sometimes a spokesperson is part of an outside organization, such as a public relations firm.

No matter who’s elected, his or her role is crucial: to represent the client in a favorable light in the media, as truthfully, factually, and credibly as possible (Stewart, 2004; Pierce, 2012, 2013). A spokesperson’s role as an advocate is bidirectional: he or she needs to clarify an organization’s viewpoint to the media as well as understand what are the media’s intent and interest in the organization (Doorley & Garcia, 2011).

A Bad Spokesperson

Lack of credibility and authenticity can break a spokesperson, as can insincerity or insensitivity. During the Exxon Valdez Oil Spill, CEO Lawrence Rawl was visibly absent for days, while the shores of pristine northwest beaches were awash with oil-covered dead sea lions, geese, and fishes. When Rawl appeared in the media on ABC’s “Good Morning America” to answer questions, he was evasive, defensive, and offensive. He showed no concern or compassion. He blamed Captain Hazelwood for the disaster. His body and spoken language communicated Exxon’s disregard for the public—and the environment (Friedman, 2002; Fearn-Banks, 2011).

Also, the words matter. In the Gulf Oil Spill and the Deepwater Horizon crisis, BP’s CEO Tony Hayward’s insensitive remarks to the media that he was tired, and “I want my life back” offended the Gulf residents whose lives had been disastrously shattered.

To add insult to injury, the person who replaced Hayward as a spokesperson was equally inept at using words. He said, “BP will take care of the little people,” expressing compensation for the victims.

If Hayward had expressed genuine empathy to the victims (instead of complaining about his life) and had he been immediately visible and accessible (instead of appearing reluctant), we might have seen a different outcome, at least in peoples’ perception of BP’s intentions.

So the context and words matter as much as credibility, honesty, humility, and accessibility—and so does the preservation of goodwill with the public.

A Good Spokesperson

In contrast, passion, honesty, humor, wit, confidence, consistency, and commitment to your principal’s best interest are also key attributes for a good spokesperson. Mike McCurry, former White House press secretary for President Clinton, was an exemplary model.

President Clinton said: “Quite simply, Mike McCurry has set the standard by which future White House press secretaries will be judged. In an age where Washington has come to be governed by a 24-hour news cycle and endless cable channels with their special niche audiences, Mike has redefined the job of [a] press secretary in a new and more challenging era” (Baker & Kurtz, 1988, p. A01).

And McCurry’s colleagues and presidential spokespersons’ scholars shower him with similar superlatives. Not only was he the most effective presidential press secretary, but he also had the tools: “substantive knowledge, the utter confidence of the president and the senior staff and the respect of the press corps as well.” And he had the “wit and personality to shrug off the worst kind of barbs” ( Baker & Kurtz, 1988, p. A01).

During presidential or organizational crisis, where the media are hungry or hostile, a spokesperson’s commitment to serve his or her principal is of paramount importance. For instance, McCurry handled the Lewinsky scandal well, even though it appeared that the White House was stonewalling. Who would want to be in his shoes facing the media? Yet he said, “the best way to serve your principal is to help the press write the story, good or bad.”

Presidential assistant Rahm Emmanuel said, “He who commands the podium commands the room.” McCurry commanded the podium, in part because he understood the press corp, in part because he listened, and in part because he fended off the press barbs with aplomb. In other words, McCurry cultivated all forms of communication with brevity, clarity, levity, and charity. Those simple concepts of communication were presidential speech writer Ted Sorensen’s gift to all forms of communication (Dragoon, 2011), including media relations.

In short, a best spokesperson, advices Friedman (2002), knows when to stop talking and start listening. He knows what, how, and when to say. He understands the audience, he empathizes with the audience—which is what she discovered when she took her nine-week-old sick son to Dr. Ross.

References

Baker, P., & Kurtz, H (1988). McCurry exit: A White House wit’s end.

The Washington Post. Retrieved from http://wapo.st/1ay8a6L

Doorley, J., & Garcia, H.F. (2011). Reputation management: The key to successful public relations and corporate communication. (2nd Ed.).  New York, NY: Routledge.

Dragoon, J. (2011). Ted Sorensen’s gift to marketing. Forbes. Retrieved from http://onforb.es/qPJdYX

Fearn-Banks, K. (2011). Crisis communication: A casebook approach (4th Ed.).  New York, NY: Routledge.

Friedman, K (2000). What makes a good spokesperson. Retrieved from http://www.karenfriedman.com/

Jones, C. (2005). Winning with the news media: A self-defense manual when you’re the story. (8th Ed.). Anna Maria, FL: Winning News Media, Inc.

Pierce, W. (2012). Crisis communication lecture notes and power point slides.

Pierce, W. (2013) Spokesperson Training and development lecture notes.

Stewart, S. (2004). A guide to meeting the press: Media training 101. Hoboken, NJ: John Wiley & Sons, Inc.

Leave a comment

Filed under Essay

Whom The Cap Fits Let Them Wear It

During the dot.com boom in Silicon Valley, a hiring manager of a hot high-tech startup company led a hot-shot software engineer, poached recently from a competitor, to his small, shared cubicle.

“Is this where I work?” said the programmer.

“Yes, everybody here works in a shared cubicle.”

“Well, I am not everybody,” snarled the programmer.

Software engineers, like novelists, have enormous egos. Just as novelists are committed to producing perfect prose, so are software programmers devoted to crafting concise code. Some programers (and novelists) are unassailable—or at least they like to think so. The younger ones tend to be more flexible, immensely creative and productive; the seasoned ones are less flexible but open to good arguments. Yet their confidence is infectious, their energy rigorous, their focus enviable.

Listen, Learn, and Leverage

Over the years, I have had the good fortune to work (and mentor) a few programmers. The biggest challenges are learning how to manage a mixed lot of intense and brilliant minds, how to collaborate their creative design ideas, how to examine alternative solutions, and how to elect and justify the best solution that will address the problem. Those are not easy questions to answer. Nor a cookbook recipe exists to address them. But there are few noteworthy nuggets and insightful ingredients to keep in mind.

My noteworthy advice is that no idea is ever tossed out just because it’s unworthy. In other words, you must examine all ideas, scrutinize them, and explore their shortcomings—not critically but constructively. Donald A. Norman suggests in The Design of Everyday Things that one must explore every design idea presented and not disregard or discard an idea unless you can offer a better one.

Collaboration of Best Ideas

This exploration of the best idea is the first ingredient in managing innovation from a pool of design ideas. Steven Johnson, in his talk Where Good Ideas Come From, explains that most good ideas evolve over time. The best ones emerge from collaboration with other ideas. Steve Jobs said that we examine over dozen ideas during our design sessions, and may be five may emerge, after careful exploration, as potential candidates. From those five, the best one—or a collaboration of two—emerges as a winner.

So if you are the head of a software design team, explore the ideas openly and discuss constructively. For in the end, the best innovative idea will emerge, advocate Johnson and Jobs.

The second ingredient in managing innovation of good ideas is an architect who probes and provokes, who cultivates and culminates, this discussion, a task no different than a moderator who mediates a complex and controversial political debate or manages a panel of egotistical experts with ease and skill. With certitude and confidence, a good architect can conduct these design review sessions.

And the last proven ingredient is your ability to listen and learn from someone less experienced, your ability to extract and explore the essence of an idea, and your ability to articulate and argue why an idea, after exploring all the alternatives, is an ideal candidate. Here is where your insight, your judgement, your experience, your wit and wisdom come together to pluck the right choice.

Collaboration and Crowdsourcing

An exploratory approach to pick the best innovative idea among many is not unique to software design. It’s common in political campaigns; message development; product development, design and strategy; employment candidacy; architecture; and policy decision. In fact, crowdsourcing for good ideas is not uncommon among corporations. For example, in March 2008, Starbucks introduced the MyStarbucksIdea, where they allowed people to submit ideas to shape its future products. Upon launch and within the first year, people submitted 70,000 ideas. Starbucks implemented 94 ideas, and launched 25 products.

In the end, among many distinct heads and many disparate caps, the right cap will always fit the right head, so whom the cap fits let them wear it.

Leave a comment

Filed under Programming

Attack Against Humanity

Attack Against Humanity

by Jules S. Damji

(remarks read at the Vigil—Lincoln Memorial, Washington, DC)

photo 2

I am not here simply because members of my community or tribe died at the hands of the terrorists at the Westgate Mall.

But I am here because a terrorist attack in Boston is an attack against humanity.

I am here because a terrorist attack in New York is an attack against humanity.

And I am here because a terrorist attack in Nairobi is an attack against humanity.

You’re here because the blood that runs in your veins is no different from the blood shed at the Westgate Mall—and you feel its senseless loss, its pain.

You’re here because the spirit and the flame of life, the flame of the candle you going to hold tonight, that flame that glows inside your heart is no different from the one that was extinguished in your friends’ hearts at the Westgate Mall—and you feel its darkness, its emptiness.

President Abraham Lincoln said, “A friend is one who has the same enemies as you have.”

And our enemies are those cowardly, callous terrorists, who preyed and pounced upon innocent citizens, our friends, our humanity, at the Westgate Mall.

But our common humanity will defeat them.

Our common courage will vanquish them.

The Great Madiba said, “Courage is not absence of fear, it’s the conquest of fear.”

And the courage shown by the heroes at the Westgate Mall, who saved others, were not without fear. But they conquered the fear and rose to fight the evil.

Like the young five-year-old British boy who stared in the eyes of Al-Shabbab terrorist, who first shot his mother in the leg and then pointed the gun at the boy.

The boy said, “You’re a coward man! Leave my sister and my mother alone.”

Imagine, a little boy standing up to the terrorist, with an AK-47, which could cut the boy into shreds, with a grenade that could blow them all into smithereens.

But the boy vanquished, because he conquered fear, because he showed courage.

So can humanity vanquish terrorists, with our collective courage!

May our friends rest in peace.

May God furnish strength to their families.

And may God grace all Kenyans.

Thank You.

photo 4

Leave a comment

Filed under Essay

Open Government data: Paradigm shift.

  1. Dr. Antsey’s eloquent, passionate appeal for government open data and transparency set the stage for the rest of the day, with brilliant speakers, open data stories from around the world, best-practices, open data advocates and evangelists, policy and technical panels tracks and discussions—and general buzz around the World Bank HQ. 

    Not to mention the attendees live tweeting and blogging. No sooner #IOGDC was trending on twitter with over 700,000 impressions within couple hours.
  2. Some precious tweets from earlier in the day that distill and capture precious nuggets of wisdom from the speakers and panelists are not accesible, but I found a few.
  3. meowtree
    yes and no RT @2twitme: “@WorldBankLive: Anstey: Give us data, we will finish job could be cry of citizens today in war on poverty.#IOGDC”
  4. Coders4africa
    Indeed!! RT @2twitme Many apps on #opendata from coders in #Africa So not prerogative of only developed nations #IOGDC http://mywarming.org/
  5. jcrowley
    #iodgc Does the open data adoption curve have a tipping point, when data gatekeepers get less effective? What are the form of feedbk loops?
  6. WyattKash
    RPI’s James Hendler at #IODGC: World now has 1,028,054 data sets, in 43 countries, 192 cataloges, 2460 cvaegories, in 24 languages #opengov
  7. 2twitme
    Jeanne holm: #opendata is moving us towards open cultural change: transparency, participation, collaboration, curation & narration. #IODGC
  8. Rufus Pollock, co-founder of Open Knowledge Foundation and CKAN Project lead, was on the panel, 

    and will deliver a keynote address Thursday. Al Kags summarized lessons learned from the Open data initiative in Kenya.

Leave a comment

Filed under Introduction