Friday, 13 December 2013

just been to sas professionals road show in the new sas office in London

A couple of days ago I attended a sas professionals ( event focusing on sas V9.4 which is due to be launched in Europe in January 2014 (with statistics 12.3, 13.1 to follow towards the end of the year) . As usual there is so much new terminology to learn and new paradigms to get one's head around. Naturally I concentrated on what really interests me - Analytics. But there are some non-analytics things that might interest analysts such as myself:
  1. Sas has significantly hardened the security
   2.  There are a few new ODS destinations that are aimed at the mobile device world. But the one that is to me the game changer is the ODS to MS PowerPoints completing the  suit of preferred delivery platforms. Let me spell this out a good sas programmer can create automatically sleek pdfs, excels, PowerPoints. Now it all can also be ziped automatically with an ods option.
  1. Sas has introduced two new scripting languages: FedSQL and DS2. The latter, DS2, is something every sas programmer who respect his-self should know. It harks back to the AF object oriented SCL (oh the good old days) so sas dinosaurs like my self will feel at home. The power, according to the presenters, is the ability to truly harness parallel programming and code objects that a truly portable to other environments. We are just facing a case where we could have benefited for the latter feature - we created an amazing solution and now the client wants the beautiful data steps dumbed down to SQL. In the new world we can just hand over the DS2 and it will work as is in say Oracle.
  1. The IT oriented people will be thrilled with the new embedded web-server (save some licencing money there) and the shinny new sas environment manager

On the analytics side the most interesting development I noted was the High Performance procedures. They are designed for MPP environments doing true parallel-in-memory processing. They come in bundles focusing on: statistics, econometrics, optimisation, data mining, text mining, forecasting. It seems that the re-written engines also perform significantly better on SNP environments (you know the pcs and servers we are using). In essence the technology uses the hardware better than ever as long as you have more than one core and a enough memory assigned to you. A small, but useful, HPxxx procs will be included in sas base if one licences other statistically oriented packages (stat, or, ets, miner …) . It would be interesting to stress test them on a SNP environment and figure out the optimal settings.

It seems to me that most of the new features that were discussed for the EM 12.3 are features that were there in 2.0 till 4.0 but disapeared in the move to the thin client in 5.0 such as Enhanced control over Decision Trees. A new and interesting additions is the Survival data mining introducing time varying covariates.

I will defiantly have to look deeper into
  • Sas Contextual Analysis
  • Recommendation engine

One interesting observation is the not many chose to go to the analytics session but to the BI and Enterprise Guide ones. Am I of a dying kind? Or is it that all the sas statistical programmers are so busy they do not have time to come to events such as this?

Tuesday, 19 March 2013

Is there a business case for underpinning strategic human capital planning with advanced numerical analytics?

Too many managers hasten to respond negatively to the question I posed in the title before really understanding fully the terminology. Evidence based decision making will never replace the good old intuition, gut feeling or back of a fag-packet decisions. To get these right you have to be brilliant and lucky. Even if you are, you have taken care of the high level but not of the details. An experienced architect will be able to immediately tell you during a site visit that there are several ways to build a bridge and propose an off the cuff strategy (say a metal hanging bridge). Even if we do not explore other options for building the bridge we cannot (and should not) proceed without detailed plans and costings. But that is exactly what is happening again and again when companies make decision about their most important resource – their people.

Most managers associate strategic human capital planning with figuring out how many people are required to perform a task. For instance how many level 2 engineers are required to handle expected peak demand for boiler repairs call-outs. This could be refined by engagement types and cost. Although this could address the immediate term need and ensure a good service level, the long term effects are not considered. For instance the future burden on the pension pot, the expected strain on the training centre due to a high employee churn and career funnel bottle necks should be evaluated and quantified. And here lays the business case – putting your finger on the long term (usually hidden) costs that could be avoided. 

A good strategic plann needs a representative sandbox. Analytical tools such as predictive modelling (what would be the demand?), simulation (this is how it works today) and optimisation (what options should I consider?) should be used in highly complex situations where the impact of a decision is multifaceted. For example, it is straightforward to expect that restricting an aeroplane mechanic to one hanger will result in lower utilisation rates. However , the impact on the number of pilots required due to filling in for colleagues waiting for planes to be serviced is not linear and is co-dependant with several other leavers that could be set at different levels. 

Taking timeout from the whirlwind of fire -fighting to look at the bigger picture is imperative for the business’ long term health.

Sunday, 10 February 2013

Are great data scientists really appreciated?

I could not agree more with Thomas C. Redman’s post “What Separates a Good Data Scientist from a Great One” (Harvard Business Review, January 28, 2013, ). I would like to suggest that sometimes it is not just down to the traits of the person doing the job. There is also an element of the company culture and environment. It got me thinking of my past experiment where the same people did great work and just work. You can employ the best data scientist in the world; but are you allowing her to be one?
Redman discussed four traits: A sense of wonder, a certain quantities knack, Persistence, and technical skills. Some of the commenters suggested business acumen, courage, Mathematician, and programmer should be added to this list. Interestingly attention to detail was not directly mentioned. So what is an environment that is conducive for grate data science work?

Good data scientists are allowed to become great when the people they works with and for understand the importance of this type of investigation and realize it is an R&D approach. I have seen many situations where the data scientist was working in a ‘consulting firm’ role. i.e. the role was defined as providing a service to the business unites. This, in itself, is a very good model which I like very much as it ensures a deep understanding of the business and cross fertilisation of ideas. The difference between good and great is in the way work is prioritised and the time allocated for its completion. On the one end of the scale, the data scientists are allowed to only respond to work requests sticking to the defined scope. This will reduce the best data scientist to a BI programmer; and trust me it is very easy to fall into this path of least challenge attitude.  Everybody is happy but the point is lost.

On the other end of the scale we have the ‘please do not bother us with trivia’, strategic thinkers who works in an academic mode on work that comes only from C-level managers and are given milestones that are months apart. To be able to pull that off one needs to be a really super data scientist working with a dream team of c-level managers. Too often I have seen these teams loosing the plot for lack of tension and focus.
As always the correct balance has to be struck. I worked in such an environment, where we mainly provided straightforward analytics (and yes, BI) to the business units but we were also given space to suggests investigations of our own. The culture was of ‘go ahead and try it, if it does not work we still learnt something’. More importantly, the top managers made a healthy distinction between a simple delivery of the findings and a simple approach (what I call the sat-nav model where the device provides a simple interface to a very complex solution). The atmosphere changed when a reorganisation brought in a new management that didn't see the value of doing more than one was asked for and spending time on investigating alternative analytical approaches. I think they have now reverted to the stone age of doing forecasts using univariate liner regressions in excel.

To pull one’s team from either of the edges of this continuum the manager of data scientists must be persistent as suggested in the original post but also a good communicator who can build trust in the quality and importance of the analysis. 

Wednesday, 30 January 2013

What do you expect of a data mining software

I just completed the 2013 Rexer Analytics data mining users survey. I make it a point to do my best to complete these surveys as they usually make me stop and think. This year there were two questions that were very relevant.

I am just finishing off a nice project I did for an international retailer that brought me in to run the process of choosing a Marketing Intelligence Platform (in English to choose a data mining software and to figure out how it should be deployed). One of the most interesting challenges of writing the RPI and RFP was agreeing with the business what was important to them. I found it pleasing that most of the points I put forward for discussion were listed in one or two of the questions in the survey. I will hold off voicing my very strong opinions until the survey results are published.

I am also very curious to see the results of the survey with regards to software preference and how the response has varied over time (this is one of the constant questions). During the process of engaging with the software providers and researching the web I have come to realise how much this arena has changed just in the last few years. I would believe that the opulence of solutions might, to an extent, lead to software selection paralyses. It is important not to drop the ball and remember why your organisation is looking into data mining and clearly define what you are expecting of the software to deliver (back to the original question above).

Do your bit and complete this survey ( – lets see how the responses pan out.