The last few seasons in the online business industry has seen a great deal of sexy work being penned on the merits of so-called “big data.” Although there are many advantages to big data, it should not be looked at as a cure-all for everything that ails you in your business endeavors. For instance, although Tableau on Hadoop is a great way to connect different types of data, you actually have to know what to do with that data for any of the consolidation to be worth anything.
Here are some of the things that you should definitely not expect from big data.
Your fantasy data scientist does not exist.
If you are looking for a data miner and analyst with a background in computer science, business and statistics all at a graduate level, you are going to be incredibly disappointed. The best that you can do is to find people with specialties in each of these disciplines and hire a leader or manager in order to synchronize the efforts of these people. With all of these extra people involved, of course you will have a huge hole in your budget. Most small business people cannot afford the team that it will take in order to take advantage of big data in this manner. For security, however, this is where big data can’t yet handle things on its own. When it comes to advanced persistent threats, you’ll be glad you have a data scientist around that can determine a threat level and act quickly.
You do not need big data to solve most of your problems.
Just because you have all of these different data machinations does not mean that you are getting any closer to a real solution. Most of your problems can be solved with an Excel spreadsheet and some simple math. The so called “machine learning” that you are hearing so much about is really just simple statistics that any college student could perform for you.
Many big data specialists are just moving around the same data.
It is really a great time to be a big data specialist, because you really just have to push around the same data in a different way in order to get hired by some department at a big firm. However, if you are a small business, then you should have a bit more of a lean philosophy. Most of the big data specialists are simply moving other numbers around and putting them on different spreadsheets with pretty visual effects. Stay away from different iterations of the same data.
Your business will immediately get faster if you use Hive.
Hive is objectively not a fast big data solution. If you are comparing the current iteration of Hive with its past iterations, then you may think that it is fast. However, compared to other solutions, it is not. You will most likely need another tool from your kit in order to maximize an aggregation platform such as Hadoop.
Every problem is somehow a problem for big data.
It is certainly a good thing to be able to analyze large swaths of data; however, this is not necessary for every problem that your business has. As a matter of fact, most of your problems have the ability to be solved from smaller groups of data. Just matching a few fields across a few sheets of data is not a job for big data, so do not take it as such.
My company does not have big data.
You do not have to have a huge company in order to have big data. Any company with more than 50 employees will most likely have the ability to organize data in a format that can be analyzed by a big data program. You will also be much more easily able to scale if you start using the small business equivalents of big data before you become a large company with too much data to analyze in an amateurish way.
Virtualization is not necessarily a solution for your nodes.
You may increase your latency or get workflow bottlenecks for no reason if you try to use virtualization instead of a more traditional approach. Just because you are trying to implement newer technologies just not mean that you are going to increase your productivity.
Although you should endeavor to implement big data solutions in order to familiarize yourself with the technology for the future, you do not necessarily have to slow down your current operations just to use big data implements or software packages. Big data is not penicillin. It should be used when it is most effective, not just to prove a point.