We provide IT Staff Augmentation Services!

Hadoop Consultant Resume

SUMMARY

  • Overall 8 years of professional experience in full Software Development Life Cycle (SDLC), AGILE Methodology and analysis, design, development, testing, implementation and maintenance in Hadoop, Data Warehousing, Linux and Java.
  • More than 5 years of experience in providing highly scalable solutions for Big Data using Hadoop 2.x, HDFS, MR2, YARN, Kafka, PIG, Hive, Sqoop, HBase, Cloudera Manager, Zoo keeper, Oozie, Hue.
  • Proven expertise in Enterprise Application product development, building scalable and high performance Big Data Applications using Hadoop, Distributed Computing, J2EE technologies including Servlets, JSP, Spring, JMS, Struts, Hibernate, Web Services, XML, JNDI, JDBC, CVS, Maven, HTML, CSS and JavaScript.
  • Hands on experience in installation and configuration Amazon EMR, Cloudera (CDH3, CDH4, &CDH5), and Horton Works Hadoop Distributions.
  • Excellent understanding/knowledge on Big data, Hadoop Architecture, NoSQL and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce2, YARN programming paradigm.
  • Good understanding and knowledge of NOSQL databases like MongoDB, HBase, Amazon RedShift and Cassandra.
  • Hands on experience in providing real - time data streaming solutions by building ETL pipeline using Apache Spark/Spark Streaming/Apache Strom, Kafka, Flume, and HDFS.
  • Extensive experience in working with different Spark modules like Spark transformations, MLib, Streaming and Spark QL.
  • Experience in importing and exporting data using Sqoop (Structured Data) and Flume (Log Files & XML) from HDFS/Hive/HBase to Relational Database Systems (RDBMS) and vice-versa.
  • Integrated various data sources like Oracle, Teradata, MySQL, Sybase, SQL server, MS access and non-relational sources like flat files into staging area.
  • Good experience in writing custom UDF’s and extending PIG scripts and Hive Queries to incorporate complex business logic into queries for high level data analysis.
  • Experience in working with Amazon AWS cloud services (EC2, EBS, S3) and data migration between different database platforms like SQL server to S3.
  • Worked with Oozie and Zookeeper to manage job workflow and job coordination in the cluster respectively.
  • Hands on experience in using BI tools like Splunk/Hunk, Tableau for Data Visualization.
  • Worked on Optimizing Map Reduce code, pig scripts, user interface analysis, performance tuning and analysis.
  • Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
  • Worked extensively with Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
  • Having good working experience in Agile/Scrum methodologies, technical discussion with client and communication using scrum calls daily for project analysis specs and development aspects.
  • Ability to work independently as well as in a team and able to effectively communicate with customers, peers and management at all levels in and outside the organization.

TECHNICAL SKILLS

Languages/Scripting: Java/ J2EE, Python, C++, Pig Latin, HiveQL, SQL, PL/SQL, LINUX shell scripts, Java, Scala

Big Data Framework/Stack: Hadoop HDFS, MapReduce, YARN, Hive, Pig, Hue, Impala, Sqoop, HBase, Spark, Ooozie, Zookeeper, Drill, Solr etc.

Hadoop Distributions: Apache, Cloudera CDH5, Horton Works

Fast Data Technologies: Kafka, Flume, Apache Spark, Strom

RDBMS: Oracle, DB2, SQL, MySQL Server, Sybase, MS Access

No SQL Databases: HBase, MarkLogic, Cassandra, MongoDB

Software Methodologies: SDLC- Waterfall / Agile, Scrum

Operating Systems: Windows XP/NT/7/8, UNIX, LINUX, Mac

Java Technologies: Hibernate, JDBC, ORM, JNDI, JSP, JSON, XML, HTML, Web Services, Spring, Struts

File Formats: XML, Text, Sequence, RC, JSON, ORC, AVRO, and Parquet etc

Amazon Web Services: Amazon EMR, EC2, EBS, S3, RedShift, BeanStalk, CloudFront, Virtual Private Cloud

PROFESSIONAL EXPERIENCE

Hadoop Consultant

Confidential

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Developed a data pipeline by integrating Kafka & Flume to collect, aggregate, and store the data from different sources and pushed to HDFS.
  • Configured Spark streaming to ingest data from the sensors through Kafka and onto HDFS for near real-time analytics using Scala/Python.
  • Performed real-time analytics on Call Detail Records (CDR) of order 5 TB through ingestion with Apache Flume onto Hdfs and Spark to identify troubling patterns for network drops.
  • Extracted the data from web servers onto HDFS using Flume.
  • Analytics on products in specific to local geos and customer segments to gain better insights.
  • Helped network administrators in allocating bandwidths in real-time by identifying the spikes in call center data.
  • Good experience in Hive partitioning, bucketing and performing different types of joins on Hive tables and implementing Hive Serdes like REGEX, JSON and Avro.
  • Worked with different Hive file formats like RC file, Sequence file, ORC file format and Parquet.
  • Developed Hive User Defined Functions in java, compiling them into jars and adding them to the HDFS and executing them with Hive Queries for data validations and processing.
  • Developed Map Reduce applications (Java/Python) using Hadoop Map-Reduce programming framework for processing.
  • Implemented different kind of joins to integrate data from different data sets like Map and reduce side join.
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop.
  • Loaded and transformed large sets of semi structured data using Pig Latin operations.
  • Import data from open data sources into Amazon S3 and Pre-Processing large data sets in parallel across the Hadoop cluster.
  • Involved in defining job flows using Oozie for scheduling jobs to manage apache Hadoop jobs by Directed Acyclic graph (DAG) of actions with control flows.
  • Implemented a variety of compression techniques to save data and optimize data transfer over network using Lzo, Snappy, etc.
  • Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning and slots configuration.
  • Assisted in monitoring the Hadoop cluster using Ganglia tool.
  • Implemented test scripts to support test driven development and continuous integration.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports (Tableau, Splunk) for the BI team
  • Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.

Hadoop Developer

Confidential

Responsibilities:

  • Moved all crawl data flat files generated from various retailers to HDFS for further processing.
  • Import/export data from Teradata database to/from HDFS using Sqoop.
  • Performed optimization on Pig scripts and Hive queries increase efficiency and add new features to existing code.
  • Written Map Reduce code that will take input as log files and parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Created External Hive Table on top of parsed data.
  • Developed, Monitored and Optimized MapReduce jobs for data cleaning and preprocessing.
  • Developed the Sqoop scripts in order to make the interaction between Pig and MySQL Database.
  • Extracted feeds from social media sites such as Facebook, Twitter using Python scripts.
  • Worked with Hadoop administrator in rebalancing blocks and decommissioning nodes in the cluster.
  • Implemented Hibernate for O/R mapping and persistence.
  • Involved in gathering the requirements, designing, development and testing.
  • Writing the script files for processing data and loading to HDFS.
  • Installed Oozie Workflow engine to run multiple Hive and Pig Jobs.
  • Developed the UNIX shell scripts for creating the reports from Hive data.
  • Created two different users (hduser for performing hdfs operations and map red user for performing map reduce operations only).
  • Managing and reviewing Hadoop log files.
  • Setup Hive with MySQL as a Remote Metastore.
  • Generated aggregations and groups and visualizations using Tableau.
  • Moved all log/text files generated by various products into HDFS location.

Hadoop Developer

Confidential

Responsibilities:

  • Writing MapReduce jobs using Java API.
  • Writing shell scripts to monitor the health check of Hadoop daemon services and responding accordingly to any warning or failure conditions.
  • Managing and scheduling Jobs on a Hadoop cluster.
  • Deployed Hadoop Cluster in the different modes- Standalone, Pseudo-distributed, Fully Distributed.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Developed Scripts and Batch Job to schedule various Hadoop Programs.
  • Installed and maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase and Sqoop.
  • Installed and configured Pig and also written Pig Latin scripts.
  • Developed the Pig UDF’S to pre-process the data for analysis.
  • Develop Hive queries for the analysts.
  • Writing Hive queries for data analysis to meet the business requirements.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
  • Took part in monitoring, troubleshooting and managing Hadoop log files.

Hire Now