I'm interested in learning one of these two tools. What would you recommend? Are there any good books or training program that is not too expensive? (knowing that I don't have any of these two software)

Thanks in advance for your help.

Hi Sandro,

I stumbled across this today and just joined Analytic Bridge, so apologies for the late reply.

I'm making some big leaps of logic and assuming you want to learn these tools to get a job :)

Of course, being ex-SPSS (I was the Clementine tech expert at SPSS) and a for a few years a heavy user of Clementine in industry I suppose my response should be a simple 'Clementine', but to be honest i think employers demanding specific toolset experience is naive and foolish.

So, can I be a pain-in-the-arse and say "neither" ?

I work in a telco, doing typical customer analytics (churn, cross-sell, SNA, profiling etc), so my comments are from that industry point of view :)

Having any data mining tool set skill is a bonus, but what I look for in a graduate or experienced analyst is a varied background and good skills in a few principal areas;
- basic stats (sampling method, outlier detection and handling etc).
- databases and SQL (awareness of how to process terabyte size data sources).
- can clearly communicate and describe/illustrate complex analysis to dummys (and never use words like 't-test' or 'eta value' unless asked :).
- can use the words "yes" and "no" to authority figures (the "no" is the important one)

It is surprising how many people claiming to be data miners have this outward appearance of a socially mal-adjusted equation quoting nerd with no ability to project manage or prioritise. In the past I've rejected analysts with 10+ years experience in favour of graduates simply because of presentation skills and ability to communicate.

In my dept we use Clementine, but we have other areas of the company that use SAS (risk for example). These toolset are designed to be easy to learn, and most companies are prepared to train you.

Ok, here's the 'against SAS' viewpoint :) The hard fact is that organisations that invest heavily in SAS typically have SAS data sets and don't as often use the data warehouse for analytics. The SAS datasets may be of a fixed format and summarised a specific way (less likely to be flexible). This is changing (SAS are working hard to get data warehouse integration), but when I ask my peers with a SAS solution they often describe an inflexible set-up involving extracting data out of the data warehouse and ETL'd into SAS files that has been established for years (but works well for them).

Conversely SPSS Clementine is designed to work with the existing data warehouse (to be honest it can't work effectively without one). Since SPSS is established in this course of development, Clementine has been leading concepts such as in-database processing and in-database mining for a few years now. Data mining and processing is converted automatically and transparently into SQL (including scoring models like CART, C5, Neural nets etc) and processed on the data warehouse. The ability to represent an entire data mining project as SQL allows for billions of rows of data to be analysed in highly efficient data warehouses.

These skillsets are not easy to learn unless you are already in the job, but realisation of the problem is halfway to solving it.

Hope my ramblings help...

You are correct: SAS shops will always use SAS datasets as part of the solution. However, if you choose, you CAN work directly from the datawarehouse. But does that make sense? Remember, data is often changing within the warehouse itself and that is the main reason to perform an extract of the data to SAS datasets. That way you are guaranteed to get the same results of an analysis from one day to the next.

Another reason to do an ETL of the data is this: Variables kept on the warehouse are usually not predictive by themselves in their native form, and it takes many transformations or combinations of variables before one can even start to do a data analysis. By this I mean things like additional data cleaning, log transforms, and imputing missing values. These transforms can be performed via the mining tool, whether it be Clementine or Enterprise miner, but it is often easier and MORE flexible just to keep them on SAS datasets. It also saves much processing time, especially if you are dealing with terabytes of data. Many of the best data miners do it this way, and it doesn't matter which tools you are using, SPSS or SAS.

Thanks Tim and Ralph for all these details. I'm now using R since my company is quite small and doesn't want to invest money in big data mining tools. However, I'm looking forward to learning and using Enterprise miner and/or Clementine.

I would like to know more about what and how exactly the data mining tools work, since it will be helpful for my research. Can you please help

How did you get the SAS EM software?
Yes, very good question, Bala Ali! SAS licenses are expensive, even for BASE SAS. I used SAS Enterprise Guide 4.1 at my most recent contracting stint with a large American managed care company. However, the cost of Enterprise Miner was significant enough that even this profitable, well-established insurer required a very solid justification and many signature before going forward with EM. .... (I just re-read the thread and don't see any reference to any actual use of the application, only documentation for EM, and a SAS EM 5.1 pdf file, which surely must be a doc, not software....?)

In the U.S.A., SAS Corp gives special licenses and support to the Centers for Disease Control, as a public service. When necessary, the C.D.C. shares these licenses (after issuing restrictive convenants regarding use), with state government epidemiologists for public health studies and program support. This was how I was able to learn SAS, despite working in the (poorly-funded) public sector.

I recommend the "The Little SAS Book" primers, third edition, by Lora Delwiche and Susan Slaughter, published by SAS Press, [email protected], also The series includes separate texts for BASE SAS, Enterprise Guide and Enterprise Miner. I haven't found anything comparable for SPSS, and rely mostly on the software documentation.
Thanks Lisa! I have already read this book which is very good to start with. But to my point of view, this is only an introductory book.
Hi Sandro,

I guess & assume your interest in developing a skill set on DM may result to enhance/increase employment opportunity.

On the other side you may be captive/limiting your self only driven by a tool. As it is an individual investment should be more concerned about the cost of acqusition/ownership.

You may give a try to one of the learning/teaching aid resources.
HandBook of Statistical Analysis & Data Mining Application, Author - Robert Nisbet, John Elder, Gary Miner, published by Elsevier- ISBN 978-0-12-374765-5.

If we may be any of your assistance please let us know.

Thanks !
Hi Biswajit,

I'm now working with SAS (Base, EG, EM) for more than one year, but I always like to learn from books. I have bought the book you mentioned. It is an excellent book which focuses on SAS, SPSS and Statistica tools. I will soon write a review of it on my blog.

Is there a book we can buy from you in India at a slightly discounted price
I learned SPSS Clementine on my own rather easily at PepsiCo, but I had a harder time learning SAS Enterprise Miner at State Farm. I am now using SAS Enterprise Guide at PepsiCo, which is a bit more challenging than plain SAS 9.1 or 9.2, but not nearly as hard as Enterprise Miner.
Mathew, you should take some advanced data mining courses with SAS!!!

