Senior Big Data Consultant Resume

SUMMARY

Open Source Contributor at Apache Sqoop, Apache Kudu and StreamSets Data Collector Project.
Strong experience creating real time data streaming solutions using Apache Spark Core, Spark SQL & Data Frames, Spark Streaming, Apache Storm, Kafka.
Experience in building Data - Pipelines using Big Data Technologies
Hands-on experience in writing MapReduce programs and user-defined functions for Hive and Pig
Experience in NoSQL technologies like HBase, Cassandra
Excellent understanding /knowledge on Hadoop (Gen-1 and Gen-2) and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager (YARN).
Excellent understanding and knowledge of NOSQL databases like HBase, and Cassandra.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS.
Proficient at using Spark APIs to cleanse, explore, aggregate, transform, and store machine sensor data
Configured a 20-30 node (Amazon EC2 spot Instance) Hadoop cluster to transfer teh data from Amazon S3 to HDFS and HDFS to Amazon S3 and also to direct input and output to teh Hadoop MapReduce framework.
Hands-on experience wif systems-building languages such as Scala, Java
Hands-on experience wif message brokers such as Apache Kafka and RabbitMQ.
Worked extensively wif Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
Implemented Hadoop based data warehouses, integrated Hadoop wif Enterprise Data Warehouse systems
Built real-time Big Data solutions using HBASE handling billions of records.
Involved in designing teh data model in Hive for migrating teh ETL process into Hadoop and wrote Pig Scripts to load data into Hadoop environment.
Expertise in writing Hive UDF, Generic UDF’s to in corporate complex business logic into hive queries in teh process of performing high level data analysis.
Worked on Spark Machine Learning library for Recommendations, Coupons Recommendations, Rules Engine.
Experience in working wif various Cloudera distributions (CDH4/CDH5) and has knowledge on Confidential and Amazon EMR Hadoop Distributions.
Experience in administering large scale Hadoop environments including design, configuration, installation, performance tuning and monitoring of teh cluster using Cloudera manager and ganglia.
Worked extensively wif Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses
Experience in Object Oriented Analysis Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
Experience in designing both time driven and data driven automated workflows using Oozie.
Experience in writing UNIX shell scripts.

TECHNICAL SKILLS

Big Data Technologies: Hadoop ( Confidential, Cloudera, MapR), Spark, Spark Streaming, Spark Sql, Spark ML, Mapreduce, HDFS, Cassandra, Storm, Apache Kafka, Streamsets, Flume, Oozie, Solr, Zookeeper, Solr, Tez, Data Modelling, Pig, Hive, Impala, Drill, Sqoop and RabbitMQ.

NOSQL Database: Hbase, Cassandra

SQL DB’s: Hive, Pig, PrestoDB, Impala, SparkQL

Search: Hsearch, Apache Blur, Lucene, Elasticsearch, Nutch

Programming Languages: Java, Scala, Python, Basic’s ( Clojure)

Cloud Platform: Amazon Web Services (EC2, Amazon Elastic Mapreduce,Amazon S3)Google Cloud Platform (Bigquery, App Engine, Compute Engine, Cloud SQL), Rackspace (CDN, Servers, Storage), Linode Manager

Monitoring and Reporting: Ganglia, Nagios, Custom Shell scripts, Tableau, D3.js, Google Charts

Data: E-Commerce, Social Media, Logs and click events data, Next Generation Genomic Data, Oil & Gas, Healthcare, Travel

Other: HTML, JavaScript, Extjs, CSS, JQuery

PROFESSIONAL EXPERIENCE