Pentaho Data Integration

Power to access, prepare and blend all data

Pentaho data integration prepares and blends data to create a complete picture of your business that drives actionable insights. The complete data integration platform delivers accurate, “analytics ready” data to end users from any source.  With visual tools to eliminate coding and complexity, Pentaho puts Big Data and all data sources at the fingertips of business and IT users alike.


Simple Visual Designer for Drag and Drop Development

Empower developers with visual tools to minimize coding and achieve greater productivity.

Drag and Drop Visual Design Approach
  • Graphical extract-transform-load (ETL) tool to load and process big data sources in familiar ways.
  • Rich library of pre-built components to access and transform data from a full spectrum of sources.
  • Visual interface to call custom code, analyze images and video files to create meaningful metadata.
  • Dynamic transformations, using variables to determine field mappings, validation and enrichment rules.
  • Integrated debugger for testing and tuning job execution.

Big Data Integration with Zero-Coding Required

Pentaho's intuitive tools accelerate the time it takes to design, develop and deploy big data analytics by as much as 15x.

Big Data Integration made easy
  • Complete visual big data integration tools eliminate coding in SQL or writing MapReduce Java functions.
  • Broad connectivity to any type or source of data with native support for Hadoop, NoSQL and analytic databases.
  • Parallel processing engine to ensure high performance and enterprise scalability.
  • Extract and blend existing and diverse data to produce consistent high quality ready-to-analyze data.

Learn more at pentahobigdata.com.

Native and Flexible Support for all Big Data Sources

A combination of deep native connections and an adaptive big data data layer ensures accelerated access to the leading Hadoop distributions, NoSQL databases, and other big data stores.

Broadest and Deepest Big Data Support
  • Support for latest Hadoop distributions from Cloudera, Hortonworks, MapR and Intel.
  • Simple plugins to NoSQL databases such as Cassandra and MongoDB, as well as connections to specialized data stores like Amazon Redshift and Splunk.
  • Adaptive big data layer saves enterprises considerable development time as they leverage new versions and capabilities.
  • Greater flexibility, reduced risk, and insulation from changes in the big data ecosystem.
  • Reporting and analysis on growing amounts of user and machine generated data, including web content, documents, social media and log files.
  • Integration of Hadoop data tasks into overall IT/ETL/BI solutions with scalable distribution across the cluster.
  • Support for parallel bulk data loader utilities for loading data with maximum performance.

Powerful Administration and Management

Simplified out-of-the-box capabilities to manage the operations in a data integration project.

Easy to Use Schedule Management
  • Manage security privileges for users and roles.
  • Restart jobs from last successful checkpoint and roll back job execution on failure.
  • Integrate with existing security definitions in LDAP and Active Directory.
  • Set permissions to control user actions: read, execute or create.
  • Schedule data integration flows for organized process management.
  • Monitor and analyze the performance of data integration processes.

Data Profiling and Data Quality

Profile data and ensure data quality with comprehensive capabilities for data managers. 

Data Quality Management
  • Identify data that fails to comply with business rules and standards.
  • Standardize, validate, de-duplicate and cleanse inconsistent or redundant data.
  • Manage data quality with partners such as Human Inference and Melissa Data.