Designing for Analytics (Brian T. O'Neill)

Read original article here

In Episode #003, I talked to Mark Madsen of Teradata on the common interests of analytics software architecture and product design. Mark spent most of the past 25 years working in the analytics field, and he is currently the global head of architecture for Teradata Consulting. He is a true analytics pioneer and a regular international speaker who also chairs several conferences and is on the O’Reilly Strata, Accelerate, and TDWI conference committees. If I only looked at job titles, Mark would be an odd fit for Experiencing Data, but the reality is that Mark has many of the traits of a good design thinker including a good sense of empathy about what users need in the world of analytics and decision support software. It's a rare combination in my experience, so I hope you enjoy the interview. Besides, Mark is also highly entertaining ????

"The reason that these things fail is that people think they need to build an intergalactic data system to solve that problem." -- Mark Madsen

"You don't build libraries by stacking books and hoping to find order in them. You figure out orders and then impose those orders in order to solve the problem." -- Mark Madsen

"Open-ended problems and broad problems tend to not lend themselves to traditional engineering design solutions and that's where you really hit back again on UX as a starting point." -- Mark Madsen

"The interesting thing to me is the knack for software developers and the educational program we have for software development is all based on function,

“What it is you need to do?” -- Mark Madsen

"We used to call it decision support. We didn't call it business intelligence or analytics or anything like that. I still like that old term." -- Mark Madsen

Brian: Alright. Mark Madsen, are you there?

Brian: Sweet. And where is here? Where are we talking to you from?

Mark: You’re talking to me from Mount Tabor in Portland, Oregon. The only volcano inside the city limits of the city in US.

Brian: Fun facts, alright. We’re already into fun facts.

Brian: Exactly. We have Mark Madsen who’s the—correct me if I’m wrong—you’re the chief architect of Teradata. Although, as I recall from when we met in London at the O’Reilly Strata Conference, your business card is null. There is no title. Can you tell us why there’s no title on it?

Mark: I can tell you two things. One, a chief architect for Teradata services, not for all of Teradata, not the Teradata mixed products, so the services side. The cards are null because the conversation the chief architect has might be with very detailed developers, or they might be with IT people in management, or they might be with executives. You don’t want to set people’s expectation based on a title when you talk to them, so you leave it blank and then you just talk about what you do instead based on what they are interested in.

Brian: That sounds like some of your consulting background at play. As I recall, you were a consultant, independent for quite some time in this whole BI space and analytics space for a long time. Can you tell us a little bit about your background? If I recall correctly, you started in game design for Apple computers like 8-bit and then you did some AI projects about 20, 30, 40 years too early. You did some mobile robotics work a little bit too early. At least too early in the sense of prior to when these technologies are more like daily topics and not academic topics. Can you give us some background right now and where you came from?

Mark: Yeah, actually that’s a really good point what you just said. A lot of things went from academic projects when I was playing around with stuff to commercial now. The thing about academia is that they are not commercially viable much of the time. I was doing all these stuff when it wasn’t viable to do it I guess. But yeah, the AI work with expert systems was the final stage of the death of AI back in the late 80s. That was funded by me in my spare time, writing 8-bit video games that went out on diskettes for Apple II computers. That’s how I paid in part for college.

Not a whole lot of relationship then although I was always interested in, “Well, if I have an expert system that understands this, could I apply it within the context of the game and make smarter opponent?” which has come full circle now because they’re not expert systems anymore. They kind of are, but that’s how those things came together, and that AI stuff led to the robotics stuff because if you’re trying to do autonomous robots there’s this intersection thing there.

That was in the early 90s and that was a bit too early as well. That’s how I got that start but all the psychology, behavioral economics, and AI stuff led me commercially to data. When I left academia I was like, “Well, you can either make a fraction of a normal income or you can apply your skills to business.” Business is much easier in the intellectual rigor sense, but it’s much harder in the complexity sense. You trade one set of puzzles for another and it works out.

Brian: That’s an interesting perspective on it. I want to get into your business insight on this whole world of data and analytics, and now we’re talking predictive analytics, machine learning. There’s all this stuff going on in this space, and of course, the theme of our discussion on this podcast is obviously the tie into user experience as helping both drive customer value and experience, but also ensuring tools and enterprise products that we build actually get used. Ultimately, they actually provide decision support and they actually either make money, or reduce cost, or whatever those end goals are which sometimes are not clearly defined.

I’m curious about why do you care about user experience. The vibe I got when we met in London, I think it was at the Strata Conference, I kind of have this thing when I meet people that are not title designers, that there are natural design thinkers out there. They might have a technical title or something and usually my radar for that is strong. I’m like, “Okay, this guy has that bone in him,” because you’re talking to a room full of the tech people about experience.

One of the quotes I saw in one of your slides was, “The right tool is the one people will use and not the one that you want them to use.” Tell me a little bit about your interest in that last mile. If the whole data pipe from data ingestion all the way through to some experience at the end, if that’s the marathon, you seem to be aware the value and the importance of the last mile. Where did that come from? Why do you care about experience and what has it done being aware of that in your career?

Mark: That’s a good question. I mean this is at the crux of a lot of product design. I was just going through a product design problem today with a company that makes the service request system that we use to fulfill our IT service request, which has a problem on one of its user submission forms. Things that are so basic and are infuriatingly frustrating because they prevent you from doing what you need to do.

I have a lot of empathy for people, but I think really, professionally for years, I was a programmer and you’re just sitting there writing the gut of systems. When I started doing data projects, it was early on because first we were applying behavioral economics and things to decision making. It was all decision theory stuff that I was working on and trying to incorporate context or AI assist. We used to call it decision support. We didn’t call it business intelligence or analytics or anything like that. I still like that old term.

In order to do that, you were trying to put a computer with a person who is a not-technical person and make apparent the information that they need or focus them on the important information either by letting them find it or guiding them to it. Those things are very high touch. If you get them wrong, then these systems do not get used or the results are not what you hope for. That’s what’s led me down there and the other aspect is totally unrelated which was that if you write a video game, a video game has to draw you in. You have to be engaged in whatever that world is that the video game creates, and the aesthetic of that world, and the rules that one operates in that world. That requires that you approach the problem user-first or player-first.

Brian: Do you think that the fact that now maybe we have an oversupply of data and then under supply of good decision support, has that created more of the aware? I think that the average executive or a VP or someone that I talked to as a client these days, they know what you access now. It’s no longer even explaining that and they understand. They don’t maybe understand it fully but they understand its value. Do you think that came out of the fact that there’s a glut of data now and now we have this problem of making it accessible? Did the glut create the awareness of, “Oh, this is a thing we actually do need to care about this.” Does that make sense?

Mark: Yeah, it makes a lot of sense. I think there’s multiple things going on. I think that’s a big one, what you just said. I think there is a deeper reality to it. I think one of the things is that my career spanned a period when nobody had a computer on their desk at work to the period where everybody does. One of the things about getting it right early on was business managers whose sole experience was VisiCalc running on an IBM PC or an Apple II, the very first spreadsheet, and that broke the IT monopoly which was green phosphor screens.

That was the state of the art and cryptic incantations and IT in control of everything and that put things into consumer hands. That created first an ability to do stuff like spreadsheets. But then when they started figuring out that if you took some of your business data and jammed it in the spreadsheets, we started to try to link the two things together—the mainframe and the spreadsheet—and that’s what got everything off and running was more things more useful. But along that path, there was a lot of design work that’s uncredited in that history that relates to the oversupply of data and the undersupply of usable systems for it, which is what you just said.

I think that what you put your finger on is key. You go through various periods of history and data gets made available, but we don’t know how to make it usable or findable or whatever, and every system has a pivot point where at first there’s not enough or just enough stuff but eventually there’s too much stuff.

You hit one of the keynote topics I did for a Strata Conference early on back in 2012 or so. Just on a history of information explosions and that history of data now is kind of the same. We’ve got lots of data and it’s distributed across silos and systems and repositories and website. You’re trying to find all these things that are applicable to your situation and use them. It used to be that most of that data got jammed into a warehouse. It got that because there were a bunch of different mainframe and minicomputer applications that had pieces of it. You put it into a warehouse to get a hold of stick view, so that you could find and dupe things. That whole design paradigm took a decade or more to develop and then 25 years to mature to today’s state which is not supporting today’s need.

But this one all the way back to Clay Tablets. I had a research question once which was just how does one manage large collections of information when they exceed the capacity of the technology? Too many files begets databases, that sort of thing, and in Clay Tablet land the question I had was, “Gosh, if you’re recording taxes on Clay Tablets, how do you manage them? What does it look like to have your tax records on big hunks of unbaked clay?” That led to a lot of digging around and Mesopotamian architecture and the information architecture of libraries like Ashurbanipal which is being or has been, I think partly reconstructed, at the British museum now.

The artifacts that they came up with to tag, essentially build metadata, to organize and structure for findability—because if you want to look at last year and compare it to this year to see whether the harvests are better or worse so you know what levy to place on the goods—all of that stuff requires information retrieval and it turns out that the techniques that were used 7,000 years ago and the techniques that we use today from an information theoretic perspective are exactly the same. But we keep forgetting that, and we build things, and then the technology becomes the view of the problem.

And so instead of thinking from principles, you think from technology, and you end up where we are now. You have this oversupply of information but everybody’s viewed it through a technical lens. The BI stuff, for example, is crap tooling for today’s information landscape which is a glut, but it was perfect for yesterday’s landscape as the solution to the previous glut.

Brian: One thing I want to stop you on that I really liked was when you talked about the tax levy, what was the crop yield the previous year. This is a great example of focusing on the end user problems and this is something I see with clients. If I’m talking to usually someone on the engineering side and they’re thinking implementation, they’re thinking how do we aggregate all of the previous crop data that we have?

And the actual user question is, “How much tax can I charge this year?” Probably they want to charge as much as they can without going too high. That’s actually what the problem is but you might need previous crop history data to make that decision. If you don’t know that and you look at it as a, “We need to visualize the crop history data,” then your chances of striking out are higher. Do you agree that that’s a gap that we see a lot in this space is the people building the services don’t always know what those tasks are?

Granted some things are exploratory, but I find that a lot of times there’s an 80/20 rule especially with tools that are designed for repetitive use. You need to support those repetitive tasks that people are going to do. If you know that the goal is to charge tax, “I need to do this every March I’m going to go and calculate the next year’s tax or whatever it is,” this system should be designed to do that.

Mark: Yes. You’ve got a knack for finding some of the fundamental problems. I think you put your finger on a couple of them in that statement. I think I can only remember one now. But the point of focusing on that decision, that’s key because that’s what we build data or analysis or analytics systems for. Whether it’s base information retrieval, “What was the level of inventory in some warehouse?” answering the question, “Do I have enough or do I need more?” or something much more complex like levying taxes in a kingdom where you really need to know what’s enough.

Is it to maintain the roads? Is it that you have to deal with the neighboring kingdom and so you’ve got to pay a bunch of soldiers to go invade them, in which case you need more money. There is always the extended context to those things. But that is the starting point. The interesting thing to me is the knack for software developers and the educational program we have for software development is all based on function. What it is you need to do?

The problem is that building systems, building applications were typically building things that collect data that do stuff on forms, “Fill in this user registration form to download this white paper,” that sort of thing, or something complicated like an inventory management system. They’re very functional. You get functional specifications to do functional tasks, with very narrow task-based context, and that task is embedded in a larger process which is the end-to-end of say, inventory management.

But inventory management in a business is one process that is part of a larger logistics problem. It’s also part of the, say, retail merchandising problem because that feeds into the stuff that’s on the shelves, which stock should be on the shelves, and which stuff shouldn’t we sell anymore. All of these things get entangled in this bigger enterprise organizational workflow and that is not a functional problem. That is a data- and decision-oriented problem.

The decision making that goes with it is interesting. That means that your functional solution has to be focused on decision-making or aid in context. At a narrow level, there’s one set of things that are on a betting and that the wide ranging level, it’s completely different. Your approach to solving that is not what you learned, it’s not what you’re taught. All of the methodologies that tend to support this tend to be very different than the agile methods that everybody applies today. It’s a very interesting difficult problem to address.

I think when you describe it the way that you did, it throws it in there because data problems came to be broader than a single system and open-ended. Open-ended problems and broad problems tend to not lend themselves to traditional engineering design solutions and that’s where you really hit back again on UX is a starting point. If you focus on the person and how and what they do in a much larger context and functional requirements, it drives you to think about the problem differently and more holistically. That open endedness is something that a lot of us as developers had to be trained out of in order to work on data systems.

That was kind of a long-winded, wasn’t it?

Brian: No, that’s okay. I think you hit on a lot of good points there and I agree. Some of this stuff is squishy when you get into the difference between getting a team aligned around a scenario versus the functional requirements. I see this happen in Agile, too, where sometimes when teams are doing Scrum, they’re really taking old-fashioned requirements and they’re just backing them into. As a whatever business analyst, I need to do X so that I can do Y. While I understand the spirit is there, they’re following the template of Scrum and writing stories, sometimes, they’ve never gone through the process of looking at the bigger like, “Where does this guy do his work? How often does he do this? What’s his life like and why does he hate doing this? Does he love doing this?”

They don’t know what that experience is like. Either he wants to have or she wants or does not want to have, the task repetition that might be involved, so much of that context is lost. I think, again, that falling in love with the problem and getting your head really around the problem is critical. Otherwise, it’s just really falling into getting to big architectural decisions and all this stuff about how you’re going to suck all this data in and then spit it out the other end and it could be a total fail.

Mark: If you look at the industry survey that a lot of the recent market attempt, has been a total fail. Gartner, Forrester, McKinsey, these analyst firms at various levels either in IT or business, are saying that in the analytics and sort of big data realm, the project success rate is somewhere in the order—depending on who you look at—of 10% to 20%. The standard, the baseline that has run through the software industry since the 70s is 50%. It’s about a 50% failure rate plus or minus five and has been since the first paper I read on the subject of giant project failures which was written in 1970.

You touched on Agile and things like that. Agile is a great methodology when you already know your architecture, when you know your fundamental architecture. If your problem is web application, or let’s say you’re Etsy or somebody like that, there’s a pretty well-understood framework within which you operate and your Agile supports the exploratory work to build a feature. What I liked about it was that it got us away from a development model of know your requirements, builds to those things, heavy upfront engineering, because websites and mobile applications are high-touch and user-dependent.

All this AB testing and things that’s supported by that very method, and along with that, of course, trying to take some of the operational components of, say, the DevOps world, melding all of that together, and that is great when you have that framework. The problem is when you have to deal with a deeper information systems and problems that people are trying to solve. Data problems are just viewed as, “We’ll pile all the data together and then I will build a feature for it.” That is exactly the wrong approach.

You don’t build libraries by stacking books and hoping to find order in them. You figure out orders and then impose those orders in order to solve the problem because the problem is one of something like say, findability which requires certain things, but there’s a lot more than that obviously.

Brian: You touched on the failure rates for these analytics and data projects. I actually wrote an article trying to gather up all of these surveys, as many as I could find. I think I only found about six. The sad part being, the November 2017 Gartner one was 85%. They actually put out a funny tweet like, “60% of all big data projects fail,” and then cross out, “oh, we meant 85%.” It was so funny. It’s been bad for a long time and something is wrong here with these big enterprise systems. This actually gets to my next question for you. You might be a really good person to answer this or at least have a perspective on it. It even touches on the whole Agile thing.

A lot of times, when I’m working on a new product or a new application, if they want to do Agile, I don’t think Agile is always the right choice for what we would call a design sprint or sprint zero. I still feel like a more traditional design process needs to happen. You need to build a runway of some design work before the Agile is going to deliver the returns for the business that it is supposed to do. I don’t think necessarily you just start coding and building day one without any, especially for a data product, you need to have some idea of where you’re going. You want technical people involved in the design process with the product manager or whoever that’s playing that data product manager role and the designer.

How do you think the right way to build if you’re building and a custom enterprise data product or application and you have a nice clean slate? There’s a ton of data out there, but you don’t want to just build another tool to go visualize all this data that’s in the warehouse or wherever it’s located from a technical standpoint.

How do you build a small increment of value when it might require a tremendous amount of plumbing just to get to step one like, “Oh my gosh, we actually spit something out in a browser.” The amount of work it require just to start getting data on the screen was huge because I know that’s an engineering problem that happens on these large enterprise projects. It might take a while just before you get something on a screen, so how do we do a small amount of value without focusing, getting too lost in big architectural discussions? Do you have any suggestions for how to do that with design and business in mind?

Mark: There’s a bunch of questions in there. I think you hit on some interesting things just in your choice of words. Like when you said, “The accessing a bunch of data and visualizing it,” it’s a presumption that all I need to do is see the data and then my problem is solved. When the data under glass is the departure point for the end-user to actually do something. The focus if you’re designing any kind of data system is, what is the action that is intended at the end of it?

And that action could be, “I’m using Tableau and I’m trying to understand a problem so that I can figure out what to do.” There the action is, to inform or understand versus something a bit more dashboardy where you’re working out what do you need to know, to measure the health of this business process and its operational status, and what do you need to know to diagnose problems within that so that you can make decisions. Do this, do that. Change this, change that. Or data products in the sense of something I used to work on for a bit was recommendations. Recommendations are very different depending on the type of thing you’re doing in the context. So you can’t just say, “I’m going to apply the same system or techniques that I used for music recommendations as I did for retail recommendations.” And that goes to the context. The way that you approach that actually looks—this is sort of surprising—but it looks sort of waterfally. It doesn’t look very agile because of what you said. I don’t know which data I need.

Your core root of your problem is, “What information do you need in what frame,” I use frame to mean sort of the mental frame or the frame of reference, “for what kind of problem?” What I see people doing repeatedly is actually succeeding first before failing. They do one siloed problem, and they build a thing that gets data from five different places. One of them was data we never used before. Fifteen years ago it might have been Clicks, now it’s something else. Blend that together to either produce a service or deliver information to somebody or to actually embed as an analytic bit that then feeds back into a system.

That is successful because the bounding on it was narrow. The goal was fairly well-understood. The reason that these things fail is that people think they need to build an intergalactic data system to solve that problem. Step one, install a Hadoop cluster. Step two, feed massive amounts of data into it. Step three, build that data product or data pipeline or whatever it is, and people look at it like, “Wow, this works. This is fantastic.” Now, when you want to start another project, they say, “Well, we did this for department A. Let’s try this for this other problem over here.” You realize that you built a siloed, hyper-optimized, functionally-oriented system that solves exactly one problem.

The problem in our market is that, handcrafting data pipelines to support individual things is exactly the pattern that we broke in the late 1980s with the data warehouse because every single process in a mainframe basically took files, built pipelines, and produced output files that were the information that was needed. It’s a human-driven, human engineering problem which builds no smarts into it, because you didn’t get enough context to solve more than one or two problems at a time. That leads you to, “Oh this is successful.” You do it a second time, you do it a third time, and then the fourth time, you start to look for these commonalities and you realized that, “No, 50% of the data is overlapping between these things, but the way we process them is different,” and you build tangles. You end up with the ball of mud architecture, to refer back to that famous paper. I think that, if I’m thinking academically, of course, if you’d want two great references, one is big ball of mud architecture, and the other is, I think it was AI or machine learning is the high-interest credit card of technical debt, the paper that was written. They outlined this in much more technical terms.

You have to do that thing you don’t want to do, which is get a broad enough view to establish the level of infrastructure support that you need, essentially to define the architecture. There’s a part which is Agile, which is the upfront exploratory pieces and the contextual construction of application and data product, and there is a part which is foundational infrastructure, which is the data components that live underneath this. The fatal mistake that is made is thinking of it as a technology problem. “We can’t use databases because X. Their cost of storage is too expensive.” I hear that all the time and it’s the stupidest thing I’ve heard. Cost of storage doesn’t matter. Your cheapest cost of storage is /dev/null. Just, “Hey write once, forget about it.” If you want to retrieve it, that’s what really matters. The reason some classes of system content management repositories, data warehouses are so expensive is the labor that goes into making retrieval fast and efficient, and it comes at the expense of making new information available, slow and inefficient. This is the actual problem that the Dewey Decimal System solved for books 100 years ago. That is what we need now.

If you don’t think about that problem, and the fact that you are not building a custom functional solution, you are making information available so that it can be remixed quickly to build the next one and the next one. You need Agile and [dween 00:30:30] and exploratory above the line, carefully curated, and fast enough to support the accretion of new information and cataloguing of it below the line. If you don’t divide the problem, you are screwed. That’s why I think there was five years of successes and excitement around a lot of analytics, followed by the last couple of years of, “Gosh, this is expensive and things aren’t working out the way we expected.”

Brian: I’m going to sound like my engineering leads, my clients, and stuff, “We can’t afford to rebuild it again in the second iteration.” The general sense is that there is a tremendous amount of lift in the first version to get to anything, and then after that you can make it better. We have this kind of ping-pong back and forth about like, “Well yes, you can develop something, but if it doesn’t generate any usage, and the usage doesn’t generate any decisions support, and that doesn’t generate any value, then you just wrote code, and you built a software application that may not have a problem that it solves.” But I can see the alternate point which is, “Okay. We solved one or two problems here, maybe we did get an idea…” I’m trying to think of an example to put this in concrete, so let’s take a fictitious example.

Let’s just presume in 10 years you walk outside, there’s hundreds of drones circling your house, delivering packages and doing all this stuff, and maybe you’re like a third-party drone service provider. We swap out propeller blades at the right time or something, I don’t know, and they want to develop a service. We all know there’s probably tons of IoT telemetry available about every working part on the drone, the towers, the communications, and all these kind of stuff. You could say, “Well, our first problem is we want to predict when the propellers need to be changed out, and I don’t know what it may be, there’s a couple of handful of tasks there.” The engineering person is going to argue that there’s a ton of data you’ll need to gather just to get to that point where we can start doing that one prediction.

But the fear is going to be, “Well, this needs to turn into a product we can charge money for to these drone operators or whatever, so we need to have more than just that one thing, or we don’t have a commercially viable product, so that means we need to think bigger about the whole architecture at the beginning,” and the next thing you know is we’re spending all this time clumming for all the possible drone, and the tower data, the whole system before we’ve even solved that first problem, which is just propeller replacement, or whatever it may be.

Are you seeing what I’m saying? You can make the argument about the need to go understand these scenarios and what the usage scenarios are, and the actual problems, and what the decision flow might look like for these users, to inform the initial engineering sprints, but there’s still that lift. Do you think it’s like, “Yes, it start with individual problems, solve those, and rework the architecture over time even if by the fourth strike it’s a big lift?” Is that the way to go?

Mark: That is exactly the wrong way to go because if you try to do that, that’s basically the solve one problem at a time, focusing on the functionality of the problem rather than what is the aggregate set of things that you need to do in the bigger picture. This is complex system stuff. You need different sets of thinking tools around it. Just applying systems dynamics, systems modeling things to think about, that forces you into the broader context. You start and you’re like, “Okay, I need this information, I’ll slap it out here. And I need this information, I’ll slap it out here.” You don’t have a framework for the information architecture. You end up with a big pile of data, which is a big part of what happened to a lot of people.

One of the big vendors in this space advocate building one system at a time but using these big clusters and just keep piling projects into it, and somehow magically, all of the information you piled into it is completely reusable. That’s a programmer-centric view of the world because as a programmer, “I see XYZ, and figure out what I need, I build my thing.” When you try to put that into the hands of a user, or you try to expand the scope of that across an organization, you end up with a giant collection of single-purpose things.

It’s sort of like trying to build a 100-storey office building and refactoring every few floors. Eventually, the technical debt that accrues, unless you figure out what you’re needing to do, that will kill you. You have to understand, “Well, this is a 100-storey building, we’ve got to use steel girder construction and concrete, so we have to put that in place first.” I think, though, construction analogies are bad analogies. I think it’s better to think about infrastructure systems, municipal water, where does the water come from, where does it go to, how is it being used, because I look at data systems and I see the two parts, and I try to partition them.

One part is the data collection and provisioning infrastructure, which is common to all at various levels of capacity. 100-storey building needs big pipes, single-family home needs little pipes. Then the second part of it is the application, and that application is where things like the Agile methods, the exploratory stuff to build data products comes from. The infrastructure piece is something you’ve got to get right. What happens instead is that people go out to the lake and build a pipe that runs all the way to their house or their building, as opposed to investing in the water system and then breaking apart the problem of water consumption.

Changing BI tools is sort of like changing the faucets or the fixtures in your kitchen. It’s at the end of a very long chain of dependencies, and that is just like the data dependencies. If you’re problem is kitchen sink faucet is one thing, fire hydrant is a different thing, and bottled water is yet another thing. We tend to focus on that system like bottled water, and then work everything in the enterprise backwards to the data. That is what you don’t want to do, that’s where I said the waterfall piece kind of comes in, or we’ll call it Sprint-Zero.

You have to survey the organization and look at both what you’ve got, what you don’t have, what you need, what the problems are around that. You have to focus on the business uses, the business cases, what’s feasible, what’s possible, and that gives you a pretty good grasp of the overall.

Where I see a lot of data products stuff go wrong, whether it’s in the startup world or in the enterprise, it’s not doing that. That first discovery phase that leaves out that context and landscapes so that you see where you’re headed and what information you need, because there’s going to be 50% overlap on a core set of information, and then there’s going to be things that only one piece needs. Going in a warehouse, picking things, and putting them into boxes for order processing. Those pick events, they’re probably only useful to somebody who’s worried about efficiency of picking operations inside warehouses. That’s a not usable piece of data, but it’s tied in with all the products data, and the order data and the other things, and that information is probably common across three quarters of the organization.

Understanding these aspects of information overlap and how one builds a framework around making it possible to supply both sets of needs simultaneously, that’s the kind of thinking where you have to sit back. It’s like that old Alan Kay quote, “You don’t Agile your way into a compiler. You have to know your methods and know when you need to gather requirements and when you can skip the requirements because you’re exploring.”

Brian: There’s two things here. I guess I’d push back on one thing and I would totally grant the other. I think that discovery phase is so often lost. Some of the people that need to be involved with that, to develop empathy, to understand who’s going to be using this stuff like day in the life, what’s it like to be this person that ultimately is going to end up using that thing you’re going to work on, are not always present in that. They’re very decoupled, they don’t have that empathy, I love that. We would call that UX research typically, but it’s going to discover what the needs are before you’ve done anything, but if you can get, especially the engineering people or the data people, whoever those are, the SMEs about the data and the analytics, get them involved with some of that process so they can understand that world a little bit before any code has been written, I think that’s a good insurance for the project.

You really have two choices. For my clients, we can design on assumption or we can design on fact. Now, you may not have all the facts, but it’s a choice. One is higher risk. Designing on assumption, or just using some designer’s opinion about what it should be based on them talking to you, you might get lucky. That’s probably better than just taking a wild-ass guess on your own. But it’s not as good as going out and spending some time, “Oh, we don’t have time to do research.” It’s like, “How can you afford not to do it?” You’re about to spend millions of bucks on this thing. So I totally agree with that.

But one thing that doesn’t scare me, but that I get concerned about is that you do some of that stuff, you then get into the weeds, and the next thing you know, “Tell me how much tax to charge for the crops in the coming year,” that kind of got lost, and it’s still really hard to do that by the time the product comes out. There’s not a black-and-white answer to this, so it’s not like, “Mark Madsen, tell us the…”

We’re having a discussion here, but I think that’s the fear, is that we can sometimes lose sight of what I would call the benchmark success criteria. Maybe you have these 8-10 problems, as you call them, like the pipelines of the older computer systems, which shoot out a file that had just what you need in it. I would say we need to shoot out experience in the tool that’s good for each one of those. It doesn’t mean there’s a wizard for every single one of these necessarily, but you do need to have some kind of criteria by which you are going to qualitatively measure the success the user experienced, the usability of the system, or else, again, you risk just writing code and having this big platform.

But at the end of the day, at the last mile, it’s like the faucets don’t work well, or, “Well yes, water comes out, but it drips, and you need to fill your gallon container, and it takes an hour to make a pot of coffee,” or whatever. I don’t know, any comments on my rant, I guess that was rant.

Mark: I like that we might get lucky. Designing in the absence of any requirement works if you can be the proxy for the person who is on the receiving end of it. But if you can’t, that’s where it gets totally random. That’s where all those discovery sessions and understanding the context. I love day-in-the-life kind of exercises, they’re my fave. I just did one yesterday. In our world, somebody who is going to be on the receiving end of, I don’t know, some data product or data system. Let’s take recommendation. What’s the context of which they’re doing things? There’s much more of a passive recipient side to that.

But if you said I want to build an environment for a data scientist to do their work, and it’s a very complex environment, and there’s all these other people that are involved because the task crosses many domains, so day-in-the-life exercises I like because they show you, “I wanted to do this, and in order to do this, I had to do that, but in order to do that…” That was the old developer meme of a few years ago about yak shaving. You’re sitting there staring at the yak, wondering why you’re shaving the yak, and thinking back on the long chain of terrible consequences of things that had to be done in order to do the thing you really wanted to do. That’s actually where a lot of users are in organizations.

I find that a lot of times, people like to blame the developers, but nobody ever educated the developer or the data guy in how to approach these kinds of problems. Universities have failed at this, everybody focuses on computer science stuff, or now with data science, they focus on math. Nobody focuses on the whole problem. When you go out and you do these things, if you do them appropriately, and that’s the trick, talking to somebody about the problem they’re solving, “Well, why are you worried about raising taxes this year? Why don’t you just do what you did last year?” “Well, because, we have a war coming” “Okay, well if you have a war coming, and you need to make spears, then you’ve got a bunch of things you need to think about.” So you’re asking these sort of what next, what before, what after, all of those things flesh out in understanding. I think that puts the understanding into the developer to make better design decisions. That’s why I feel that UX stuff and starting exercises from the complete end-user, and removing a lot of technical aspects out of the conversation helps so much.

The last mile being, the key to the success or failure of a lot of information-driven systems. That right there, that’s what you started with, the right tool is the one that people wants to use, not the one you want them to use, which is how IT thinks of their role. You use what we built and bought for you. Eat your vegetable, they’re good for you.

Brian: Totally. You’re spot-on. I’ll tell you, there’s nothing like a developer or a stakeholder who has seen the light, and they’d either watch someone suffer through the crappy thing that they made, or someone else’s crappy tool, or they’ve simply just spent some time and a light went on. I don’t know if maybe, did you have any particular thing that was illuminated from the session, the discovery you did yesterday? That’s one of the most exciting things for me, I think, about being a designer is when you find this nugget of stuff that no one has talked about, and you’re like, “Wow, I had no idea that you have a team of…” there’s like four people that you need to talk to to do this, and we think we’re building a self-service tool for this, and there’s an approval chain, and you have to send this data, this other thing and it comes back, and you got to share it with this other person. Wow. Just head exploded but in a good way, like, “Oh my gosh, and we can totally solve all of this but we never knew that this was even a problem.” Did you have any moment like that yesterday in your session?

Mark: I think there were a couple, probably. It happens almost every time because there’s always some bit of context. Maybe one person on your team knows it, the other five don’t, and it’s just assumed. Everybody just sort of assumes, or they’re unaware, so fostering that, sometimes it calls into question assumptions. The discovery that, “Wait a minute, we’re building all of this stuff into our product to do X. But do we really, actually need to do X? Because most of the time in this context, it’s going to be done over here, not over there.” That completely changes what the product ought to do or what the data should be, or whatever it is you’re building. That’s the sort of thing that comes up. That changes your engineering efforts.

Everybody talks about self-service data integration in order to do things like build data products. You have a data engineering team and they work on this stuff but you want self-service so that analyst types and data scientists could do a lot of that themselves. Then you build a system which is only amenable to developers rather than those guys, which happens all the time. There’s all these assumptions about resources, where you can do things, how you can do things, what skill levels people have, and where they view the value of their time.

I think one of the big things for me years ago was I did an internal user survey. I was running two teams: a business intelligence team and an analytics team. The analytics guys were doing consumer research, so digging into people’s behavior and what they do, and the other was just the core business intelligence in the organization. I was really struggling with some of the contextual aspects of this. At context, I just, I talk to all of these people so I kind of knew, but I didn’t know how much time they spent every day in various tasks.

We did some task studies, nothing really formal, in fact, driven by interns. Armed interns with a piece of paper and a pencil, or a spreadsheet, sent them out to talk to people, look at what they do, how often and how frequently, how long they do these things, and you find that the average business intelligence tool, or tableau dashboard, or whatever it is that you’re using, 15 minutes. Our organization, the bulk of it, the median was at 15 minutes.

If you only use a tool for 15 minutes a day, that’s not enough time to really become proficient or learn how to do most things, and so you better design the experience around that much more tightly than the small number of people who spend a lot more time in it. But developers, and I include in this professional analysts, tend to presume that other people do that a lot more than they do. So, ”Oh it’s easy to use this tool. Just do X-Y-Z.” It’s the curse of knowledge right there. They’re so familiar and they spend four hours a day doing it. It should be obvious to someone who does it, you know, 10 minutes, every other day.

That’s the sort of the last mile problem to me was figuring out what those environments needed to look like based on one single fact, which was, “How much time do you spend interacting with this system? If 10 minutes out of 8 hours in a day is all you do, do you view that as important?” That is one of the most interesting things that leads to the success and failure of just basic and give data to end-user systems. Unless they see the value of the information, the KPIs, whatever it is you’re delivering to them, they view it on the basis of, I need to get to this meeting, and I need to do this stuff–that is unimportant, that is probably one of the least important things, and what they want you to do is minimize that time from 15 minutes to five.

Brian: Yeah, I’d say, broadly speaking about any time-based stuff in the UX, to take it with a grain of salt because sometimes more time spent can be good, more time spent can be bad, and less time spent could be good or bad as well. You need the qualitative side of it, you need to understand the context, “Are they in and out to solve a specific problem? What is the current value of x for this report? Okay, got it, it’s 92.6.” Or is it like, “I need to tell them which department we should spend more money on next year to get more whatever it may be,” that’s a different thing. It sounds like you guys did it, a diary study, so you had the, what we call a diary study in the UX world, but they’re self-recording their usage of the tool in this type of thing, is that what that was?

Mark: We did two things. One, we instrumented the system so that we could see how long they spent doing which types of activities, like look at a dashboard, drilled down into some metrics, run a report, run a query, so you could see what they did, and what they did most frequently. The other was qualitative, it was sending people out saying, “Okay, you looked at this, what were you doing, why were you looking at that?” to get a more complete picture. This is a really good point about the time aspect of it, because sometimes the answer is, they need to spend more time but it’s too hard. It’s like playing that video game where you get stuck at the same point every single time.

Years back, I worked on a video games system, where we were instrumenting the games to understand what was going on, which is common practice today in multi-user games. One of the things is find these places where people get frustrated and quit. They get stuck, it’s too hard, but in a multi-user game driven off of servers as opposed to one installed via CDs on a PC, you can change these things and you can make it slightly easier. In one case, we are looking at a racing game and there was a particular sequence of things at one point where people just really couldn’t get through. If you didn’t master that, you got frustrated, and then you could see it, because you take all the user activity and you map cohorts of people, and you look at the pattern of gameplay. In a lot of these systems you can of course create skinner boxes, but the idea is to try and maximize that play time and keep them playing and if it’s too frustrating, they quit.

We did that right around that time that Raph Koster wrote one of my favorite books called The Theory of Fun. It’s really called The Theory of Fun for Game Design. But we actually used that as part of the design bible for how one does end-user data delivery. The idea of skill plateaus is not something that most application developers think of, because they think they’re building an application with a specific function. A lot of data systems are really tools for people to accomplish the goal, not the system that embodies the goal itself. That book has just tons of great design and experience advice and how to build systems that successively reveal complexity so that as you get better, the experience becomes richer, but you’re capable of working in that environment. That’s a very hard nonfunctional requirement for people to design towards.

Brian: That’s a great example, about the game analytics there. This is again something that sometimes my clients have trouble with, or there’s pie in the sky, what I would call like very-few-word business goals like, “Drive revenue,” and they’re so non-design-actionable. This is a great example. One could be, we want to increase gameplay, or more specifically maybe it’s we want to increase gameplay by 5%-10% in terms of time.

The scenario for the tool or the service that the internal tool you need to build may be, “We need to find points in the tool where gameplay is too difficult and people abandon.” That’s your service and then the tasks might be like, “You log in, and I want to see, has anything changed since last time? Are the sticky points still in the same place?” Theoretically, you might have made some changes so you probably want to understand, “Is it still hard or not? Do we need to go revisit more time or not? Where are those places?” Then the next question may be, “Why is it hard? What is going? What does the data say?” Ideally, the system could generate some conclusions for you and provide evidence as back up, but maybe your MVP is, you just provide some kind of evidence and they have to conclude the why part of it. But that’s how you take this pie in the sky goal of, like, “Increase gameplay,” down to something really specific there, and understand what’s going to go into that from an experience perspective. That’s really cool, I didn’t [...] they did that, but I don’t work on games, but that’s pretty cool. I didn’t know they’re all doing that now, that’s neat.

Mark: They’re pretty much all doing that now. Any multi-user, or even mobile games, you’ll see it in mobile app design, too. I didn’t work on it, but I would presume that one of those popular games like Candy Crush or Angry Birds would dial in that stuff. I know the guys who’re building Angry Birds had a lot of telemetry on these things, but I don’t know what they did.

It’s the collection of those things that you just described. Well, first of all, you described that the thread from the top to the bottom, which is how you figure out all the information for a particular use case. The other is that the collection of 20 or 30 of those across different points in the organization gives you the shape of the data space and the kinds of things in terms of capacity and capability that you need. One part of it feeds the application line or which is what are you trying to do and enable to support, and the other part of it the infrastructure component, and you now have enough information to guide the lower levels, the platforming space for the data work. Those two things go hand in hand.

Brian: Wow, man, this has been fun, we could probably go on for more hours and stuff, but I don’t want to take up too much more of your time. But is there any concluding thoughts if someone was to walk away here? You have a lot of experience in this space. If I wanted to get better at designing good enterprise data products, is there any particular advice you might give to a data product manager or an analytics leader, a data science manager. Everyone’s intentions are good. They’re all trying to develop better services, but is there a core message you would give to the listeners?

Mark: Really, over the years, we put different labels on things, but I think the key point is that starting point of goal. Looking back on the conversation about failure, there are companies that have spent hundreds of millions of dollars putting in big data and analytics infrastructure. They spent all that money without really knowing what the goal was. In data science, the problem is exactly the same. A bad data scientists doesn’t take what you described as a problem and turn into something actionable.

Increase margins. Well, we could increase margins by decreasing costs or increasing sales, which way do you want to go, and you play that game of chasing it down, but the art of building an analytic model is very similar. You have to come up with an evaluation criteria for the model, and it has to be concrete and explicit. If you don’t know of the usage that you are heading towards, then it’s sort of like you don’t know where you’re going, any road will do. That’s the challenge.

That’s why I thought it would be interesting talking with you, and just from sort of UX perspective, even though these days most of my time is based on building plumbing. The reason it’s all based on plumbing is because everybody wants fixtures, and then starts with a fixture, and runs it all the way back. It’s like rewiring your house every time you buy a new toaster.

Brian: I’m still picturing a four-foot sewer pipe running from the pond down to my house. I hope that doesn’t break or get clogged.

Mark: In IT we’ve got all of that. That, plus the rewiring of the house, plus we rebuild the house every year because we’re adding another floor.

Brian: This is super fun. Mark Madsen, where can people find you online? Are you on any of the Twitters and the social medias and the interwebs? Where are you out there?

Mark: These days I’m not out there that much because I’m not really doing much that’s public anymore but Twitter is one place, which is mainly just random things I find interesting. Conferences, there’s always the O’Reilly and [...] conferences, because I like doing conferences.

Mark: @MarkMadsen, yeah, and the other thing is whenever I do something that’s speak-worthy, I post it to Slideshare.

Brian: Well, thanks so much for coming on here, it’s been great. I’ll put some of our links, maybe we’ll have a Clay Tablet link and a big ball of mud, there’s a lot of mud and dirt themes going on in this episode! I’ll try to put links to those, the credit card AIs, the high-interest credit card, there’s some good stuff here. I got to check out that design bible to The Theory of Fun for Game Design, that sounds cool, so thanks for those recommendations. Thanks for coming on.

Mark: You bet! Thank you for having me.

Brian: Yeah, alright. See you later.

Images Powered by Shutterstock

The Data Daily

Designing for Analytics (Brian T. O'Neill)