Just gave a brown bag session on learnings from my career as
an analyst. Or as I put it, learning from my mistakes. While preparing for the
talk I realised that it could be distilled into one word: “Curiosity”. At a
certain point in my career I started noticing that the breadth of my experience
is an asset. Many people will recommend not moving around too much and focusing
on one technology or methodology. That was not the case with me. My natural
curiosity led me to develop technical skills in data retrieval, reporting and
statistical analysis using different platforms and I was open to learn new
analysis methodologies. No strike that - am still open to lean. You will find on
my laptop R with loads of packages I tried out alongside all sort of evaluation
versions of software solutions I am exploring. I am doing it because I am
curious. I read somewhere an acronym - I
want to not only decompose it to words but also understand what lies behind
that. So my advice to budding analysts: Be curious. Pick up as many technical
and analytical skill as you can – do not wait to be sent on a course or to be
allotted time to deep dive. You never know when opportunity calls.
Wednesday 26 November 2014
Wednesday 1 October 2014
Dell has taken over Statsoft
Yesterday I attended a meetup event organized jointly by the
OR society and Dell-Statistica discussing the use of predictive analytics to
better patient care. The topic is very interesting and the discussions were
very lively. But the big news for me was to discover that Dell have purchased
Statsoft and are now promoting Statistica. I hope they do not repeat the
mistakes IBM did (and is doing) when it took over SPSS. For starters Statistica
is better than SPSS on several level. One of them being that, like sas, it has
a strong data management capability of its own. Do to real stuff with SPSS you
need to link it to some over expensive IBM product. Last time I looked you also
got more bang for your buck compared to SPSS – i.e. more functionality. I am refraining
from fully comparing it to sas as I believe that in Europe it is irrelevant due
to the sas pricing model. Even if sas is better by far than any other solution,
most organisations in Europe, and the far east to that matter, will struggle to
compile a business case for the expenditure. If you can afford it, sas is still
my first choice. However R and Statistica are close behind. I will be watching
Dell to see how they position the software and analytics services crossing my
fingers they manage to find a way to turn it into a cohesive offering (like sas)
fast rather than the hodgepodge of mixed messages one gets from IBM.
Friday 13 December 2013
just been to sas professionals road show in the new sas office in London
A couple of days ago
I attended a sas professionals (http://www.sasprofessionals.net/)
event focusing on sas V9.4 which is due to be launched in Europe in January
2014 (with statistics 12.3, 13.1 to follow towards the end of the year) . As
usual there is so much new terminology to learn and new paradigms to get one's
head around. Naturally I concentrated on what really interests me - Analytics.
But there are some non-analytics things that might interest analysts such as
myself:
- Sas has significantly hardened the security
2.
There are a few new ODS destinations that are aimed at the mobile device
world. But the one that is to me the game changer is the ODS to MS PowerPoints
completing the suit of preferred
delivery platforms. Let me spell this out a good sas programmer can create
automatically sleek pdfs, excels, PowerPoints. Now it all can also be ziped
automatically with an ods option.
- Sas has introduced two new scripting languages: FedSQL and DS2. The latter, DS2, is something every sas programmer who respect his-self should know. It harks back to the AF object oriented SCL (oh the good old days) so sas dinosaurs like my self will feel at home. The power, according to the presenters, is the ability to truly harness parallel programming and code objects that a truly portable to other environments. We are just facing a case where we could have benefited for the latter feature - we created an amazing solution and now the client wants the beautiful data steps dumbed down to SQL. In the new world we can just hand over the DS2 and it will work as is in say Oracle.
- The IT oriented people will be thrilled with the new embedded web-server (save some licencing money there) and the shinny new sas environment manager
On the analytics
side the most interesting development I noted was the High Performance
procedures. They are designed for MPP environments doing true
parallel-in-memory processing. They come in bundles focusing on: statistics,
econometrics, optimisation, data mining, text mining, forecasting. It seems
that the re-written engines also perform significantly better on SNP
environments (you know the pcs and servers we are using). In essence the
technology uses the hardware better than ever as long as you have more than one
core and a enough memory assigned to you. A small, but useful, HPxxx procs will
be included in sas base if one licences other statistically oriented packages
(stat, or, ets, miner …) . It would be interesting to stress test them on a SNP
environment and figure out the optimal settings.
It seems to me that
most of the new features that were discussed for the EM 12.3 are features that
were there in 2.0 till 4.0 but disapeared in the move to the thin client in 5.0
such as Enhanced control over Decision Trees. A new and interesting additions
is the Survival data mining introducing time varying covariates.
I will defiantly
have to look deeper into
- Sas Contextual Analysis
- Recommendation engine
One interesting
observation is the not many chose to go to the analytics session but to the BI
and Enterprise Guide ones. Am I of a dying kind? Or is it that all the sas
statistical programmers are so busy they do not have time to come to events
such as this?
Tuesday 19 March 2013
Is there a business case for underpinning strategic human capital planning with advanced numerical analytics?
Too many managers hasten to respond negatively to the question I posed in the title before really understanding fully the terminology. Evidence based decision making will never replace the good old intuition, gut feeling or back of a fag-packet decisions. To get these right you have to be brilliant and lucky. Even if you are, you have taken care of the high level but not of the details. An experienced architect will be able to immediately tell you during a site visit that there are several ways to build a bridge and propose an off the cuff strategy (say a metal hanging bridge). Even if we do not explore other options for building the bridge we cannot (and should not) proceed without detailed plans and costings. But that is exactly what is happening again and again when companies make decision about their most important resource – their people.
Most managers associate strategic human capital planning with figuring out how many people are required to perform a task. For instance how many level 2 engineers are required to handle expected peak demand for boiler repairs call-outs. This could be refined by engagement types and cost. Although this could address the immediate term need and ensure a good service level, the long term effects are not considered. For instance the future burden on the pension pot, the expected strain on the training centre due to a high employee churn and career funnel bottle necks should be evaluated and quantified. And here lays the business case – putting your finger on the long term (usually hidden) costs that could be avoided.
A good strategic plann needs a representative sandbox. Analytical tools such as predictive modelling (what would be the demand?), simulation (this is how it works today) and optimisation (what options should I consider?) should be used in highly complex situations where the impact of a decision is multifaceted. For example, it is straightforward to expect that restricting an aeroplane mechanic to one hanger will result in lower utilisation rates. However , the impact on the number of pilots required due to filling in for colleagues waiting for planes to be serviced is not linear and is co-dependant with several other leavers that could be set at different levels.
Taking timeout from the whirlwind of fire -fighting to look at the bigger picture is imperative for the business’ long term health.
Sunday 10 February 2013
Are great data scientists really appreciated?
I could not agree more with
Thomas C. Redman’s post “What Separates a Good Data Scientist from a Great One”
(Harvard Business Review, January 28, 2013, ).
I would like to suggest that sometimes it is not just down to the traits of the
person doing the job. There is also an element of the company culture and environment.
It got me thinking of my past experiment where the same people did great work
and just work. You can employ the best data scientist in the world; but are you
allowing her to be one?
Redman discussed four traits: A
sense of wonder, a certain quantities knack, Persistence, and technical skills.
Some of the commenters suggested business acumen, courage, Mathematician, and programmer
should be added to this list. Interestingly attention to detail was not
directly mentioned. So what is an environment that is conducive for grate data
science work?
Good data scientists are allowed
to become great when the people they works with and for understand the importance
of this type of investigation and realize it is an R&D approach. I have
seen many situations where the data scientist was working in a ‘consulting firm’
role. i.e. the role was defined as providing a service to the business unites.
This, in itself, is a very good model which I like very much as it ensures a
deep understanding of the business and cross fertilisation of ideas. The
difference between good and great is in the way work is prioritised and the
time allocated for its completion. On the one end of the scale, the data
scientists are allowed to only respond to work requests sticking to the defined
scope. This will reduce the best data scientist to a BI programmer; and trust
me it is very easy to fall into this path of least challenge attitude. Everybody is happy but the point is lost.
On the other end of the scale we
have the ‘please do not bother us with trivia’, strategic thinkers who works in
an academic mode on work that comes only from C-level managers and are given milestones
that are months apart. To be able to pull that off one needs to be a really
super data scientist working with a dream team of c-level managers. Too often I
have seen these teams loosing the plot for lack of tension and focus.
As always the correct balance has
to be struck. I worked in such an environment, where we mainly provided straightforward
analytics (and yes, BI) to the business units but we were also given space to
suggests investigations of our own. The culture was of ‘go ahead and try it, if
it does not work we still learnt something’. More importantly, the top managers
made a healthy distinction between a simple
delivery of the findings and a simple
approach (what I call the sat-nav model where the device provides a simple
interface to a very complex solution). The atmosphere changed when a
reorganisation brought in a new management that didn't see the value of doing
more than one was asked for and spending time on investigating alternative
analytical approaches. I think they have now reverted to the stone age of doing
forecasts using univariate liner regressions in excel.
To pull one’s team from either of
the edges of this continuum the manager of data scientists must be persistent
as suggested in the original post but also a good communicator who can build
trust in the quality and importance of the analysis.
Wednesday 30 January 2013
What do you expect of a data mining software
I just completed the 2013 Rexer Analytics data mining users survey.
I make it a point to do my best to complete these surveys as they usually make
me stop and think. This year there were two questions that were very relevant.
I am just finishing off a nice project I did for an international
retailer that brought me in to run the process of choosing a Marketing
Intelligence Platform (in English to choose a data mining software and to figure
out how it should be deployed). One of the most interesting challenges of writing
the RPI and RFP was agreeing with the business what was important to them. I
found it pleasing that most of the points I put forward for discussion were
listed in one or two of the questions in the survey. I will hold off voicing my
very strong opinions until the survey results are published.
I am also very curious to see the results of the survey with
regards to software preference and how the response has varied over time (this
is one of the constant questions). During the process of engaging with the
software providers and researching the web I have come to realise how much this
arena has changed just in the last few years. I would believe that the opulence
of solutions might, to an extent, lead to software selection paralyses. It is
important not to drop the ball and remember why your organisation is looking
into data mining and clearly define what you are expecting of the software to
deliver (back to the original question above).
Do your bit and complete this survey (www.RexerAnalytics.com/Data-Miner-Survey-2013-Intro.html)
– lets see how the responses pan out.
Tuesday 22 May 2012
Migrating sas from XP(32) to Win7(64)
A client that is preparing to roll out windows 7 64 bit to all its employees asked me to ensure that the sas functionality is not lost. Currently they use V9.2 on xp in the good old fashioned way – disparate pc installations of base/stat/graph/ets ….
Repeating the same installation script used for xp 32 bit I have experienced only one difference – the access to office files. This applies to the versions of office that use the four letter extension (e.g. xslx instead of xls). The fix is to add to the installation script the ‘pc files server’ and modify relevant proc import and libname statements.
The statement in Win XP (32)
proc import file="<filename>.xlsx" out=<dataset> dbms=excel; run;
should be modified in Win7 64 to specify a different dbms
proc import file="<filename>.xlsx" out=<dataset> dbms= EXCELCS; run;
Pointing at an MS Access collection in Win XP(32)
Libname mylib "<filename>.accdb";
should be modified in Win7 64 to specify the pc files engin
Libname mylib pcfiles path=""<filename>.accdb";
Exploring the issues in the installation I had interesting chats with the IT people responsible for purchasing sas and packaging it for enterprise wide installation. I could fill pages and pages discussing their thoughts, pains and complaints. It boils down to poor documentation (EXPALANTION) of the installation decisions that need to be made and the PRICE. I had to scrape the person from the floor after he got the quote from sas for a server (no frilly stuff like BI or EG).
Subscribe to:
Posts (Atom)