Finding Delta

Discovering gems of wisdom from massive data sets

Archive for the ‘Analytics’ Category

Business Intelligence And Data Warehousing On A Budget

leave a comment »

Business intelligence suits/products offer a lot and companies offer great service and system support. However these suit/vendor product solutions can be extremely expensive! I really don’t think you need an expensive ETL suit or Business Intelligence product to run a rewarding data warehouse and analytical plant. I run a large very plant with the following open simple components and open source technologies.

• A Linux/Unix scheduling system.
• A general script to load delimited data into a db table
• A general script to run a proc
• A general script to extract delimited data from another db via proc or table extract.
• A general script to extract delimited data from the web.

Procs on a scratch db that sits alongside your main data warehouse db can be sued to transform the data and load into the main warehouse.

I’d recommend an open source stack of the following:

Scheduler: cron/puppet
ETL Scripts: Python (Perl would also work well)
DB Storage: MySQL
Data Analysis: Excel, R, Python

Obviously this will not solve everybody’s needs however with the correct schema architecture this warehouse would scale for the majority of businesses at very little cost to build and maintain.

In future posts I aim to outline why these agile technique help you build a plant for you own needs without the extraordinarily high yearly BI toolset costs.

Let me know if this appeals to you and I’ll create more detailed posts to follow…

Written by mattalcock

November 11, 2009 at 6:32 pm