Recently, I responded to a linkedIn article about a cyber attack on a cancer center. The person who posted it basically asked the question, why would a hacker go after something like a cancer center or a school? After reading the posts, it seemed to me that there is a lack of understanding as to why and how hackers actually do what they do.

Why Do Hackers Hack?

One of the biggest misconceptions about hacking is that people think that they are not important or that they don’t have anything that hackers want. The issue here is that people don’t understand why people hack…


photo-1501743411739-de52ea0ce6a0

For the last few years, I’ve been involved with Splunk engineering. I found this to be somewhat ironic since I’ve haven’t used Splunk as a user for a really long time. I was never a fan of the Splunk query language (SPL) for a variety of reasons, the main one being that I didn’t want to spend the time to learn a proprietary language that is about as elegant as a 1974 Ford Pinto. I had worked on a few projects over the years that involved doing machine learning on data in Splunk. which presented a major challenge. …


A Job Interview
A Job Interview
https://unsplash.com/photos/bwki71ap-y8

Realities of a Data Science Job Search

I’ve been waiting for some time to publish this, but I wanted to write about my experiences interviewing for data science jobs. Here’s my story, I worked at Booz Allen for nearly seven years but I felt it was time for a change. I very much like Booz Allen as a company and if anyone is interested in working there, please don’t hesitate to contact me. But I felt I was ready for different challenges and started looking for work elsewhere.

Now that I started a new position, I thought I’d share some observations about what I learned from interviewing…


People often ask me questions about starting a career in data science or for advice what tech skills they should acquire. When I get asked this question, I try to have a conversation with the person to see what their goals and aspirations are as there’s no advice that I can give that is universal, here are five pointers that I would say are generally helpful for anyone starting a career in data science or data analytics.

Tip 1: Data Science is a Big Field: You Can’t Know Everything About Everything:

When you start getting into data science, the breadth of the field can be overwhelming. It seems that you have to be an…


One of the big challenges a data scientist faces is the amount of data that is not in convenient formats. One such format are REST APIs. In large enterprises, these are especially problematic for several reasons. Often a considerable amount of reference data is only accessible via REST API which means that to access this data, users are required to learn enough Python or R to access this data. But what if you don’t want to code?

The typical way I’ve seen this dealt with is to create a duplicate of the reference data in an analytic system such as…


The image above is a late 50’s MGA convertible. I picked this because I happen to think that this car is one of the most elegantly designed cars ever made. Certainly in the top 50. While we as people place a lot of emphasis on design when it comes to physical objects that we use, when it comes to software, a lot of our software’s design looks more like the car below: This vehicle looks like it was designed by a committee that couldn’t decide whether they were designing a door stop or a golf cart.


Today marks about the 45th day I’ve been stuck in the house and it happens that my birthday was last week, so I’ve been doing a lot of reading and reflecting on things. The last few weeks have been really up and down. I’ve been doing a lot of puttering around the house and working on silly projects like replacing the headlight gaskets on my MGA, which also involved painting the headlight buckets, cutting off rusty screws and redoing wiring, but I digress. Despite being home all the time, I’m finding it very difficult to get any meaningful work done.


As you are reading this, you are probably (like me) under quarantine or shelter in place due to the COVID-19 outbreak. As a data scientist who has been stuck in the house since 10 March, I wanted to take a look at the data and see what I could figure out. I’m not an epidemiologist and claim no expertise in health care, but I do know data science so please take what I am saying with a grain of salt.

Why is there no data?

My first observation is that very little data is actually being made publicly available. I am not sure why this…


There is a data format called HDF5 (Hierarchical Data Format) which is used extensively in scientific research. HDF5 is an interesting format in that it is like a file system within a file, and is extremely performant. However, the HDF5 format can be quite difficult to actually access the data that is encoded in HDF5 format. But, as the title suggests, this post will walk you through how to easily access and query HDF5 datasets using my favorite tool: Apache Drill.

As of version 1.18 Drill will natively support reading HDF5 files.

Configuring Drill to Query HDF5

In order to configure Drill to query HDF5…


In the beginning of any data analytic effort, one of the key determinants of success is the data itself. Often, in real-world situations, the data will come in a variety of formats, and stored in a variety of systems. As a former government contractor, I’ve been in several situations where I’ve been called upon to analyze “big data”, which is a government term for a share drive full of random Excel files. While this isn’t big data in the truest sense, it is still annoying or data in the wild. Other examples of untamed data, might be gigabytes of log…

Charles Givre

Founder of DataDistillr, Data Science enthusiast, Instructor and Apache Drill PMC Chair. Contact me at charles(at)datadistillr.com.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store