SQL Power Business Intelligence & Data Migration ConsultingThe Canadian Business Intelligence (BI) Authority, based in Toronto
Company OverviewConsulting ServicesMobile BIXBRL SolutionsCase Studies: Client Success StoriesResources & TrainingTechnology PartnersContact Us

Open Source BI Market Overview

Open Source
Business Intelligence
Market Overview

September 1, 2009
Contributed by Steve Holub

Open Source

 

The following survey provides a list of open source software tools used in Business Intelligence and Data Warehousing systems. The tool selection criteria was based on the frequency and currency of the releases (solutions with updates within two years or less only are included) and on whether the product has released a stable build which could be used in a production environment. This study looked at BI tools in the following categories: Databases, ETL tools (Extract, Transform, Load), Master Data Management, BI Reporting tools, and Data Mining. There are some open source software bundles that overlap categories;. In this case, we divided the software bundle into its separate parts for ease of categorization.

We define 'fully open source' to mean software solutions that offer all of their source code to the public under an open source license. This means that any software which offers additional features or functionality (typically with a support subscription), and does not offer the source code freely to the public, was not considered fully open source. As well, any features mentioned in the tables below refer to the open source version of the software. Organizations that do not provide commercial support for their product, but do point their customers to third-party support solutions receive a "Y" with a notation (*), to indicate that commercial support is available. Our analysis was based on a review of the OSS tools market as of July, 2009.

 

Database Management Systems (DBMS)

There are a wide variety of database management systems (DBMS) available for data warehouse solution. Standard relational databases, such as MySQL and PostgreSQL, have dominated the open source database market for some time. More recently, new open source database systems have emerged that target the unique requirements of data warehousing. Column-oriented databases (such as Infobright, LucidDB, and MonetDB) increase data reading performance by storing data in columns (rather than rows) which is important for a data warehouse where read-optimization takes precedence over write-optimization, and where typical queries look at the attributes of a column for a set of records rather than the components of a row.

There have been several developments in the area of distributed-computing (i.e. the 'cloud'). The distributed database solutions presented below are built on top of the open source Java framework for distributed systems: Hadoop. Hadoop-based databases are best suited for data warehouses which number in the hundreds of millions of rows (or greater). These distributed data warehouse systems have typically been used for web analytics of high-traffic websites. Finally, although there are several open source embeddable database solutions, none of them are scalable to the extent required by most data warehouse systems, and for that reason, were excluded from this study.

 

DBMS Software,
Website,
Company/Sponsor
DB Type Fully Open Source? Commercial Support? Latest Version Release
Cloudbase
cloudbase.sourceforge.net
Business.com
MapReduce, distributed Y June 2009
Firebird
www.firebirdsql.org
Firebird Foundation/Borland
Relational Y Y* April 2009
HBase
hadoop.apache.org/hbase
Apache
Column-oriented, distributed Y May 2009
Hive
hadoop.apache.org/hive
Apache
MapReduce, distributed Y Y* April 2009
Hypertable
www.hypertable.org
Doug Judd/Zvents
Column-oriented, distributed Y May 2009
Infobright
www.infobright.com
Infobright
Column-oriented Y June 2009
Ingres
www.ingres.com
Ingres
Relational Y November 2008
LucidDB
www.luciddb.org
LucidEra/Eigenbase
Column-oriented Y May 2009
MonetDB
monetdb.cwi.nl
Centrum Wiskunde & Informatica
Column-oriented Y Y July 2009
MySQL
www.mysql.com
Sun/Oracle
Relational Y April 2009
Palo OLAP Server
www.jedox.com
Jedox
Multidimensional Y March 2009
PostgreSQL
www.postgresql.org
PostgreSQL
Relational Y Y* June 2009

 

Extract, Transform and Load (ETL) Tools

Data Extraction, Transformation and Loading (ETL) is the most time consuming component of the data warehouse development lifecycle. Typically for this reason alone, it is important to have a good tool which allows the user to interact with as many different source systems as possible, while maintaining an acceptable level of usability and performance and providing metadata regarding the transformations undertaken against the data. There are several open source ETL tools available today that provide good performance, good error handling, and metadata management.

 

ETL Software,
Website,
Company/Sponsor
Fully Open Source? Commercial Support? Latest Version Release
Apatar
www.apatar.com
Apatar
Y Y April 2009
Clover.ETL
www.cloveretl.com
OpenSYS
Y July 2009
KETL
www.ketl.org
Kinetic Networks
Y Y* October 2008
Open Data Integrator Project
open-dm-di.dev.java.net
Sun/Oracle
Y September 2008
Palo ETL Server
www.jedox.com
Jedox
Y December 2008
Pentaho Data Integration (Kettle)
kettle.pentaho.org
Pentaho
Y May 2009
Talend Open Studio
www.talend.com
Talend
Y May 2009

 

Master Data Management

Business Intelligence environment are often sourced from many different source systems - from spreadsheets to ERP applications to 3rd party data - which may have the same key dimension data (eg. Customer, Product, Salesrep) repeated in more than one source. Master data management (MDM) tools are used to identify duplicate records and consolidate them into one unified record which is then stored in the data warehouse.

The same customer could be viewed as a potential lead by the marketing department, while the billing department is treating her as an existing client. Potentially, this customer could be receiving both bills for a service and marketing material to encourage her to purchase the very same service which she is already consuming. Data warehouse systems consolidate these two customer records into one through the use of MDM tools ensuring that the customer information and status is consistent across the business system.

MDM Software,
Website,
Company/Sponsor
Fully Open Source? Commercial Support? Latest Version Release
DataCleaner
datacleaner.eobjects.org
Eobjects
Y Y* April 2009
SQL Power DQguru
www.sqlpower.ca/dqguru
SQL Power Group
Y Y April 2009
Mural
mural.dev.java.net
Sun/Oracle
Y September 2008
Open Source Data Quality and Profiling project
sourceforge.net/projects/dataquality
Arrah
Y July 2008

 

Business Intelligence Reporting & Analytics Tools

Business intelligence (BI) reporting & analytics tools are what business users typically use to access their data warehouse data, so it is imperative that the tool be intuitive and user-friendly. There are a variety of open source solutions which deliver a variety of features that target different groups of users (e.g. report developers, decision makers, non-specialized users, etc.), so it important to explore as many tools as possible before deciding on which software package is right for your organization. In the table below, we focus on four common features found in BI reporting software: standard reporting, ad hoc reporting, OLAP reporting, and dashboards.

Standard reports are those commonly found in business which can address a variety of different business reporting requirements (e.g. monthly sales by store by product).

Ad-hoc reporting allows non-technical business users to develop 'on-the-fly' reports without having any knowledge of SQL or the underlying database structure.

OLAP reporting refers to online analytical processing which is a technique specifically designed to improve performance of business intelligence queries by pre-aggregating data (and other common analytical tasks) and by modeling the data multi-dimensionally. The benefit of having OLAP functionality is the ease with which users can drill-down and roll-up into different levels of data hierarchies (e.g. country, to province, to region, to store).

Dashboards are user-defined visual summaries of key performance indicators for a business which allows the user to interact with the charts and graphs (e.g. drilling-down into specific business areas).

Reporting & Analytics Software,
Website,
Company/Sponsor
Standard Reporting Ad-hoc Reporting OLAP Dash-
boards
Fully Open Source? Commercial Support? Latest Version Release
BIRT
www.eclipse.org/birt
Eclipse Foundation/
Actuate
Y Y Y Y June 2009
DataVision
datavision.sourceforge.net
Jim Menard
Y Y Y July 2008
Jaspersoft BI Suite
www.jasperforge.org
Jaspersoft
Y Y Y Y June 2009
OpenI
www.openi.org
OpenI
Y Y Y Y Y Y June 2009
OpenReports
www.oreports.com
Open Source Software Solutions
Y Y Y Y May 2009
OpenRPT
www.xtuple.com/openrpt
xTuple
Y Y Y January 2009
Palo Worksheet
www.jedox.com
Jedox
Y Y Y Y Y March 2009
Pentaho BI Platform
community.pentaho.com
Pentaho
Y Y Y Y Y May 2009
SQL Power Wabit
www.sqlpower.ca/wabit
SQL Power Group
Y Y Y Y Y Y July 2009

 

Data Mining Tools

Data mining techniques are used in business intelligence systems to determine patterns and relationships amongst data attributes and potential outcomes. Data warehouses are the most common source of data for mining business information because they are typically the most comprehensive and extensive data source available within the company, and are well suited for time-series analysis. Data mining has the potential of discovering correlations that are hidden within the sea of data. For example, by using data mining techniques, a company with sufficient customer data could determine which contacts (based on their attributes, such as age, location, marital status, etc.) are more likely to respond to a direct mail marketing campaign, which in turn could help the company reduce costs by targeting only those customers who are the most likely to respond to the advertisement.

Reporting & Analytics Software,
Website,
Company/Sponsor
Fully Open Source? Commercial Support? Latest Version Release
Knime
www.knime.org
University of Konstanz
Y Y May 2009
Pentaho Data Mining (Weka)
weka.pentaho.org
Pentaho
Y June 2009
R Project
www.r-project.org
R Development Core Team
Y Y* June 2009
RapidMiner (YALE)
www.rapidminer.com
Rapid-i
Y March 2009

 

Member Log-in

Email/ID

Password

Need an account?
Register for free