We provide IT Staff Augmentation Services!

Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Deerfield, IL

SUMMARY

  • Overall 6 years of IT experience in a variety of industries, which includes hands on experience in Hadoop developer.
  • Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, flume, Spark, HBase, Yarn, Oozie, and Zookeeper.
  • Hands on experience in machine learning, big data, data visualization, R and Python development, Linux, SQL, GIT/GitHub.
  • Excellent knowledge on Hadoop Ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
  • Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Strong experience in writing applications using python, Scala and MySQL
  • Hands on experience on configuring a Hadoop cluster in a professional environment and on Amazon Web Services (AWS) using an EC2 instance.
  • Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
  • Strong experience on Hadoop distributions like Cloudera, MapR and Horton Works.
  • Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase.
  • Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, parquet and Avro.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
  • Experienced in python data manipulation for loading and extraction as well as with python libraries such as NumPy.
  • Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice - versa.
  • Extensive Experience on importing and exporting data using stream processing platforms like Flume
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Successfully migrated multiple Scala and Pyspark applications from old clusters to new LCM clusters.
  • Excellent Java development skills using J2EE, J2SE web services.
  • Extensive working experience with Python including Scikit-learn, SciPy, Pandas, and NumPy developing machine learning models, manipulating and handling data.
  • Strong experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
  • Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
  • Worked in large and small teams for systems requirement, design & development.
  • Preparation of Standard Code guidelines, analysis and testing documentations.
  • Extracted data from HDFS using Hive, Presto and performed data analysis using Spark with Scala , pySpark and feature selection and created nonparametric models in Spark
  • Experience in working with Hadoop in Stand-alone, pseudo and distributed modes.
  • Good Knowledge on Cloud Computing with Amazon Web Services like EC2, S3 which provides fast and efficient processing of Big Data.

TECHNICAL SKILLS

Big Data Tools: Hadoop, HDFS, Sqoop, Hbase, Hive, Spark, Kafka, Airflow,pyspark

Cloud Technologies: Snowflake, SnowSQL, AWS, Azure, Databricks

ETL Tools: SSIS, Talend

Modeling and Architect Tools: Erwin, ER Studio, Star-Schema, Snowflake-Schema Modeling, FACT and dimension tables, Pivot Tables

Database: Snowflake Cloud Database, Oracle, MS SQL Server, Teradata, MySQL, DB2

Operating Systems: Microsoft Windows and Unix

Reporting Tools: MS Excel, Tableau, Tableau server, Tableau Reader, Power BI, QlikView

Methodologies: Agile, UML, System Development Life Cycle (SDLC), Ralph Kimball, Waterfall Model

Machine Learning: Regression Models, Classification Models, Clustering, Linear regression, Logistic regression, Decision trees, Random Forest, Gradient Boosting, K nearest neighbor (KNN), K mean, Naïve Bayes, Time Series Analysis, PCA, Avro, MLbase

Python and R Libraries: R-tidyr, tidyverse, dplyr, lubridate, ggplot2, tseries Python - numpy, scipy, matplotlib, seaborn, pandas, scikit-learn

Programming Languages: SQL, R (shiny, R-studio), Python (Jupyter Notebook, PyCharm IDE)

PROFESSIONAL EXPERIENCE

Confidential, Deerfield, IL

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
  • Managing fully distributed Hadoop cluster is an additional responsibility assigned to me. I was trained to overtake the responsibilities of a Hadoop Administrator, which includes managing the cluster, Upgrades and installation of tools that uses Hadoop ecosystem.
  • Used Pyspark Data forms approach for creating the Cal,Dapply and Payfone reporting tables.
  • Worked on Installation and configuring of Zookeeper to co-ordinate and monitor the cluster resources.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked on POC’s with Apache Spark using Scala to implement spark in project.
  • Consumed the data from Kafka using Apache spark.
  • Configured Linux native device mappers (MPIO), EMC power path for RHEL 5.5, 5.6, and 5.7.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Involved in loading data from LINUX file system to HDFS
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Configured Linux native device mappers (MPIO), EMC power path for RHEL 5.5, 5.6, and 5.7.
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive
  • Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs.
  • Worked in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Work with structured/semi-structured data ingestion and processing on AWS using S3, Python. Migrate on-premises big data workloads to AWS Snowflake/ Redshift & Azure Databricks.
  • Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Responsible for loading data files from various external sources like MySQL into staging area in MySQL databases.
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Developed real time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka.
  • Successfully migrated multiple Scala and Pyspark applications from old clusters to new LCM clusters.
  • Used Pyspark Data forms approach for creating the Cal,Dapply and Payfone reporting tables.
  • Actively involved in code review and bug fixing for improving the performance.
  • Good experience in handling data manipulation using python Scripts.
  • Involved in development, building, testing, and deploy to Hadoop cluster in distributed mode.
  • Created Linux shell Scripts to automate the daily ingestion of IVR data
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Used Python and R scripting by implementing machine algorithms to predict data and forecast data for better results
  • Created HBase tables to store various data formats of incoming data from different portfolios.
  • AWS Architect/developer certification is mandatory for this role.
  • Actively involved on proof of concept for Hadoop cluster in AWS. Used EC2 instances, EBS volumes and S3 for configuring the cluster.
  • Involved in migrating the ON PREMISE data to AWS.
  • Used Hive and created Hive tables, loaded data from Local file system to HDFS.
  • Production experience in large environments using configuration management tools like Chef and Puppet supporting Chef Environment with 250+ servers and involved in developing manifests.
  • Created EC2 instances and implemented large multi node Hadoop clusters in AWS cloud from scratch using automated scripts such as terraform.

Environment: Hadoop, HDFS, Pig, Apache Hive, Sqoop, Apache Spark, Shell Scripting, HBase, Python, Zookeeper, MySQL.

Confidential, TX

Hadoop Developer

Responsibilities:

  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Configured Sqoop Jobs to import data from RDBMS into HDFS using Oozie workflows.
  • Involved in creating Hive Internal and External tables, loading data and writing hive queries, which will run internally in map, reduce way.
  • Used Pyspark Data forms approach for creating the Cal,Dapply and Payfone reporting tables.
  • Created batch analysis job prototypes using Hadoop, Pig, Oozie and Hive.
  • Assisted with data capacity planning and node forecasting.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Documented the systems processes and procedures for future references.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Used Python and R scripting by implementing machine algorithms to predict data and forecast data for better results.
  • These applications were built using Spark Scala API and Pyspark API.
  • Performed CRUD operations in HBase.
  • Developed Hive queries to process the data.
  • Monitoring, Performance tuning of Hadoop clusters, Screening Hadoop cluster job performances and capacity planning Monitor Hadoop cluster connectivity and security Manage and review Hadoop log files.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Understands and develops data access related to storing, retrieving or acting on housed data.
  • Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDD's.
  • Hands on experience in provisioning and managing multi-node Hadoop Clusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, Hadoop, HDFS, Map Reduce, Hive, HBase, Linux, Cluster Management

We'd love your feedback!