Hadoop Developer Resume
Deerfield, IL
SUMMARY
- Overall 6 years of IT experience in a variety of industries, which includes hands on experience in Hadoop developer.
- Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, flume, Spark, HBase, Yarn, Oozie,and Zookeeper.
- Excellent knowledge on Hadoop Ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
- Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Strong experience in writing applications using python, Scala and MySQL
- Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
- Strong experience on Hadoop distributions like Cloudera,MapRand Horton Works.
- Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase.
- Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, parquet and Avro.
- Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice - versa.
- Extensive Experience on importing and exporting data using stream processing platforms like Flume
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Excellent Java development skills using J2EE, J2SE web services.
- Strong experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
- Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
- Worked in large and small teams for systems requirement, design & development.
- Preparation of Standard Code guidelines, analysis and testing documentations.
- Experience in working with Hadoop in Stand-alone, pseudo and distributed modes.
- Good Knowledge on Cloud Computing with Amazon Web Services like EC2, S3 which provides fast and efficient processing of Big Data.
TECHNICAL SKILLS
Big Data/ Hadoop: HDFS, MapReduce, Zookeeper, Hive, Pig, Sqoop, Flume, Oozie, Spark, HBase, Spark, and Apache Kafka
Cloud Computing: Amazon Web Services.
Java/J2EE Technologies: J2EE, Python MySQL and Scala
Database: Oracle (SQL & PL/SQL), My SQL, HBase.
IDE: EclipseXML Related and Others XML, DTD, XSD, XSLT, JAXB, JAXP, CSS, AJAX, JavaScript.
PROFESSIONAL EXPERIENCE
Confidential, Deerfield, IL
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
- Managing fully distributed Hadoop cluster is an additional responsibility assigned to me.
- I was trained to overtake the responsibilities of a Hadoop Administrator, which includes managing the cluster, Upgrades and installation of tools that uses Hadoop ecosystem.
- Worked on Installation and configuring of Zookeeper to co-ordinate and monitor the cluster resources.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on POC’s with Apache Spark using Scala to implement spark in project.
- Consumed the data from Kafka using Apache spark.
- Load and transform large sets of structured, semi structured and unstructured data.
- Involved in loading data from LINUX file system to HDFS
- Importing and exporting data into HDFS and Hive using Sqoop
- Implemented Partitioning, Dynamic Partitions, Buckets in Hive
- Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs.
- Worked in creating HBase tables to load large sets of semi structured data coming from various sources.
- Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Responsible for loading data files from various external sources like MySQL into staging area in MySQL databases.
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
- Actively involved in code review and bug fixing for improving the performance.
- Good experience in handling data manipulation using python Scripts.
- Involved in development, building, testing, and deploy to Hadoop cluster in distributed mode.
- Created Linux shell Scripts to automate the daily ingestion of IVR data
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
- Created HBase tables to store various data formats of incoming data from different portfolios.
Environment: Hadoop, HDFS, Pig, Apache Hive, Sqoop, Apache Spark, Shell Scripting, HBase, Python, Zookeeper, MySQL.
Confidential, TX
Hadoop Developer
Responsibilities:
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Configured Sqoop Jobs to import data from RDBMS into HDFS using Oozie workflows.
- Involved in creating Hive Internal and External tables, loading data and writing hive queries, which will run internally in map, reduce way.
- Created batch analysis job prototypes using Hadoop, Pig, Oozie and Hive.
- Assisted with data capacity planning and node forecasting.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Documented the systems processes and procedures for future s.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Performed CRUD operations in HBase.
- Developed Hive queries to process the data.
- Monitoring, Performance tuning of Hadoop clusters, Screening Hadoop cluster job performances and capacity planning Monitor Hadoop cluster connectivity and security Manage and review Hadoop log files.
- Load and transform large sets of structured, semi structured and unstructured data.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, Hadoop, HDFS, Map Reduce, Hive, HBase, Linux, Cluster Management
Confidential
Hadoop Engineer
Responsibilities:
- Responsible for analyzing large data sets and derive customer usage patterns by developing new MapReduce programs.
- Written MapReduce code to parse the data from various sources and storing parsed data into HBase and Hive.
- Worked on creating combiners, partitions, and distributed cache to improve the performance of MapReduce jobs.
- Developed Shell Script to perform data profiling on the ingested data with the help of HIVE Bucketing.
- Responsible for debug, optimization of Hive scripts and implementing DE duplication logic in Hive using a rank key function (UDF).
- Experienced in writing Hive validation scripts that are used in validation framework (for daily analysis through graphs and presented to business users).
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Developed workflow in Oozie to automate the tasks of loading data into HDFS and pre - processing with Pig and Hive.
- Imported all the customer specific personal data to Hadoop using Sqoop component from various relational databases like Netezza and Oracle.
- Used Impala to read, write and query the Hadoop data in HDFS and HBase.
- Worked with BI teams in generating the reports and designing ETL workflows on Tableau.
- Develop testing scripts in Python and prepare test procedures, analyze test results data and suggest improvements of the system and software.
- Experience in streaming log data using Flume and data analytics using Hive.
- Extracted the data from RDBMS (Oracle, MySQL & Teradata) to HDFS using Sqoop
Environment: Hadoop, MapReduce, HDFS, Pig, Hive QL, HBase, Zookeeper, Oozie, Flume, Impala, Cloudera, MySQL, UNIX Shell Scripting, Tableau, Python, Spark.