Data Intelligence, Business Analytics
hi. I'm currently evaluating on the data mining tools above. I would like to ask some question.
1) how good is the tools in accessing and managing data? Which is the best software among the four tools?
2) Is Enterprise miner a machine learning tools?
3) Does Orange, R and Enterprise miner supports multi-cores?
4) Orange is a white box or black box tools?
5) Enterprise Miner provides scripting data mining?
Any other good information that can help me do a clear comparison between these 4 data mining tools will be good.
thanks.
Permalink Reply by jadelim on June 15, 2012 at 12:32am Also, in terms of power and flexibility, as well as scalability, which software is better?
Permalink Reply by Balázs Bárány on June 15, 2012 at 9:57am I know R and RapidMiner so that's what I will answer about.
1)
R has a huge array of possibilities for connecting to databases, Big Data solutions, and processing all kinds of files and documents, including save files from other statistics packages. But you need to learn the scripting/programming language in order to process your data.
RapidMiner also has the most important connectors: databases, Excel, CSV, etc. In RapidMiner, you can use wizards for setting up your data sources and a graphical environment for processing data flows.
3. R has support for multi-cores (and even computing clusters) with the foreach packages. RapidMiner also has a parallel execution plugin.
Both R and RapidMiner are available for free and with source, but also in a professionally supported licensed commercial package from the authors (RapidMiner) or Revolutions (R).
RapidMiner can also execute R scripts for data input, transformation and graphing so you can easily connect the two.
Which is better depends on your background and your needs. There are lots of good books for R; RapidMiner is more intuitive and you can find ready-to-use examples on myexperiment.com.
R is where cutting edge statistical and data mining research happen. Of course, for that, you need to look at and try new, sometimes experimental packages. Everything in the canon of established methods and algorithms is there, too. RapidMiner has a smaller but still huge range of data mining methods, and can use the Weka library with lots more.
Permalink Reply by jadelim on June 16, 2012 at 7:43am Thanks for the information. can i know more about R and rapidminer in terms of data manipulation? they extract sampling, has direct access to database or both? So R has more connectivity to data like odbc, gateway than rapidminer?
can R and rapidminer pass rules directly to OLAP tools and receive data for mining from OLAP tools, as well as, can direct access to warehouse?
Which software is better in size constraints (handling maximum number of rows or records)?
Just to double confirm, both of them can support for mining very large databases right? but which software is better in this?
Permalink Reply by Balázs Bárány on June 16, 2012 at 11:35pm Both R and RapidMiner have direct access to most relational databases. R supports ODBC and many database systems directly. RapidMiner is written in Java so it uses JDBC; most relational database systems have JDBC drivers. There should be some JDBC to ODBC bridge, too.
The difference is in supporting file formats of obscure statistics packages you probably never heard of. You should't have problems reading your relational database data and files in RapidMiner or R.
R is also a programming language, and RapidMiner can be extended with Groovy scripts or Java modules, so in the end you can write any data access methods, including OLAP tools if those have a defined API.
Both R and RapidMiner are memory-based systems. So they analyze your data as long as it fits the RAM of your computer. On a 64-bit operating system you can easily have 24 GB of RAM to analyze more than 20 GB of data.
Revolution Analytics, a provider of commercial enhanced R versions also has extensions for processing larger datasets.
RapidMiner has Radoop (beta version), which uses the Hadoop environment for processing large datasets.
Permalink Reply by jadelim on June 17, 2012 at 8:16am It is possible you could help me grade this in your opinion?
Rate from 1- 5
1 - very bad , 5 excellent
|
Rapid Miner |
R |
Product architecture |
||
Data manipulation- extract sampling, direct access to database or both? |
|
|
Warehouse/OLAP intergration |
||
Connectivity to other tools |
|
|
Performance |
||
Support for multiple user access |
|
|
Support for mining very large databases |
|
|
Function |
||
Mining approaches |
|
|
Mining techniques |
|
|
Presentation |
||
Data visualisation |
|
|
Environment |
||
Platform independence |
|
|
Size constraints (in handling maximum number of rows or records) |
|
|
Permalink Reply by Balázs Bárány on June 17, 2012 at 1:41pm It's hard to assign numbers because it depends on your environment, your programming ability and the technology you are using.
With R, you have all possibilities but you need to learn the R language and install modules.
RapidMiner has a graphical modelling interface for ease of use but if your needs are special, you must do some scripting or developing extensions, too.
RapidMiner has good and easy to use graphical capabilities. R is the champion in visualisation but it is a bit harder to create pretty graphs. (Recently, some graphical interfaces for graphs have been developed, search for "Deducer ggplot2".)
The size constraints of the standard packages depend on your memory size; with the commercial Big Data tools, both can support almost unlimited data sets.
Permalink Reply by Balázs Bárány on June 17, 2012 at 11:06pm No. R has both established and experimental algorithms. R is probably the overall leader in mining techniques because most researchers use R for their first publication.
For example, there are not only classical decision trees (in package rpart) but also an innovative approach called conditional trees (in package party).
Permalink Reply by jadelim on June 18, 2012 at 1:13am so what other techniques does R has besides decision trees? Conditional trees? and?
Permalink Reply by Balázs Bárány on June 18, 2012 at 1:32am Everything. SVM, neural nets, regression, whatever you want. As I wrote, R is the favourite tool of scientists, so both well-established as well as experimental research algorithms are available.
Permalink Reply by jadelim on June 18, 2012 at 2:15am But it only support in the R enterprise version, not the open source version ?
Permalink Reply by jadelim on June 19, 2012 at 8:06pm Hi. Can you provide me with more information to what do R and Rapidminer provide in their open source tools? not the commercial. thanks.
Permalink Reply by Nissim Matatov on July 19, 2012 at 10:18am See http://rexeranalytics.com/Data-Miner-Survey-Results-2010.html .
Next survey will be in early 2013.
© 2013 AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC