Hadoop Developer Resume Deerfield, IL - Hire IT People

SUMMARY

Overall 6 years of IT experience in a variety of industries, which includes hands on experience in Hadoop developer.
Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, flume, Spark, HBase, Yarn, Oozie, and Zookeeper.
Hands on experience in machine learning, big data, data visualization, R and Python development, Linux, SQL, GIT/GitHub.
Excellent knowledge on Hadoop Ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
Strong experience in writing applications using python, Scala and MySQL
Hands on experience on configuring a Hadoop cluster in a professional environment and on Amazon Web Services (AWS) using an EC2 instance.
Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
Strong experience on Hadoop distributions like Cloudera, MapR and Horton Works.
Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase.
Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, parquet and Avro.
Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
Experienced in python data manipulation for loading and extraction as well as with python libraries such as NumPy.
Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice - versa.
Extensive Experience on importing and exporting data using stream processing platforms like Flume
Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
Successfully migrated multiple Scala and Pyspark applications from old clusters to new LCM clusters.
Excellent Java development skills using J2EE, J2SE web services.
Extensive working experience with Python including Scikit-learn, SciPy, Pandas, and NumPy developing machine learning models, manipulating and handling data.
Strong experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
Worked in large and small teams for systems requirement, design & development.
Preparation of Standard Code guidelines, analysis and testing documentations.
Extracted data from HDFS using Hive, Presto and performed data analysis using Spark with Scala , pySpark and feature selection and created nonparametric models in Spark
Experience in working with Hadoop in Stand-alone, pseudo and distributed modes.
Good Knowledge on Cloud Computing with Amazon Web Services like EC2, S3 which provides fast and efficient processing of Big Data.

TECHNICAL SKILLS

Big Data Tools: Hadoop, HDFS, Sqoop, Hbase, Hive, Spark, Kafka, Airflow,pyspark

Cloud Technologies: Snowflake, SnowSQL, AWS, Azure, Databricks

ETL Tools: SSIS, Talend

Modeling and Architect Tools: Erwin, ER Studio, Star-Schema, Snowflake-Schema Modeling, FACT and dimension tables, Pivot Tables

Database: Snowflake Cloud Database, Oracle, MS SQL Server, Teradata, MySQL, DB2

Operating Systems: Microsoft Windows and Unix

Reporting Tools: MS Excel, Tableau, Tableau server, Tableau Reader, Power BI, QlikView

Methodologies: Agile, UML, System Development Life Cycle (SDLC), Ralph Kimball, Waterfall Model

Machine Learning: Regression Models, Classification Models, Clustering, Linear regression, Logistic regression, Decision trees, Random Forest, Gradient Boosting, K nearest neighbor (KNN), K mean, Naïve Bayes, Time Series Analysis, PCA, Avro, MLbase

Python and R Libraries: R-tidyr, tidyverse, dplyr, lubridate, ggplot2, tseries Python - numpy, scipy, matplotlib, seaborn, pandas, scikit-learn

Programming Languages: SQL, R (shiny, R-studio), Python (Jupyter Notebook, PyCharm IDE)

PROFESSIONAL EXPERIENCE

Confidential, Deerfield, IL

Hadoop Developer

Responsibilities:

Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
Managing fully distributed Hadoop cluster is an additional responsibility assigned to me. I was trained to overtake the responsibilities of a Hadoop Administrator, which includes managing the cluster, Upgrades and installation of tools that uses Hadoop ecosystem.
Used Pyspark Data forms approach for creating the Cal,Dapply and Payfone reporting tables.
Worked on Installation and configuring of Zookeeper to co-ordinate and monitor the cluster resources.
Implemented test scripts to support test driven development and continuous integration.
Worked on POC’s with Apache Spark using Scala to implement spark in project.
Consumed the data from Kafka using Apache spark.
Configured Linux native device mappers (MPIO), EMC power path for RHEL 5.5, 5.6, and 5.7.
Load and transform large sets of structured, semi structured and unstructured data.
Involved in loading data from LINUX file system to HDFS
Importing and exporting data into HDFS and Hive using Sqoop.
Configured Linux native device mappers (MPIO), EMC power path for RHEL 5.5, 5.6, and 5.7.
Implemented Partitioning, Dynamic Partitions, Buckets in Hive
Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs.
Worked in creating HBase tables to load large sets of semi structured data coming from various sources.
Work with structured/semi-structured data ingestion and processing on AWS using S3, Python. Migrate on-premises big data workloads to AWS Snowflake/ Redshift & Azure Databricks.
Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Responsible for loading data files from various external sources like MySQL into staging area in MySQL databases.
Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
Developed real time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka.
Successfully migrated multiple Scala and Pyspark applications from old clusters to new LCM clusters.
Used Pyspark Data forms approach for creating the Cal,Dapply and Payfone reporting tables.
Actively involved in code review and bug fixing for improving the performance.
Good experience in handling data manipulation using python Scripts.
Involved in development, building, testing, and deploy to Hadoop cluster in distributed mode.
Created Linux shell Scripts to automate the daily ingestion of IVR data
Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
Used Python and R scripting by implementing machine algorithms to predict data and forecast data for better results
Created HBase tables to store various data formats of incoming data from different portfolios.
AWS Architect/developer certification is mandatory for this role.
Actively involved on proof of concept for Hadoop cluster in AWS. Used EC2 instances, EBS volumes and S3 for configuring the cluster.
Involved in migrating the ON PREMISE data to AWS.
Used Hive and created Hive tables, loaded data from Local file system to HDFS.
Production experience in large environments using configuration management tools like Chef and Puppet supporting Chef Environment with 250+ servers and involved in developing manifests.
Created EC2 instances and implemented large multi node Hadoop clusters in AWS cloud from scratch using automated scripts such as terraform.

Environment: Hadoop, HDFS, Pig, Apache Hive, Sqoop, Apache Spark, Shell Scripting, HBase, Python, Zookeeper, MySQL.

Confidential, TX

Hadoop Developer

Responsibilities:

Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Configured Sqoop Jobs to import data from RDBMS into HDFS using Oozie workflows.
Involved in creating Hive Internal and External tables, loading data and writing hive queries, which will run internally in map, reduce way.
Used Pyspark Data forms approach for creating the Cal,Dapply and Payfone reporting tables.
Created batch analysis job prototypes using Hadoop, Pig, Oozie and Hive.
Assisted with data capacity planning and node forecasting.
Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
Documented the systems processes and procedures for future references.
Monitored workload, job performance and capacity planning using Cloudera Manager.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Used Python and R scripting by implementing machine algorithms to predict data and forecast data for better results.
These applications were built using Spark Scala API and Pyspark API.
Performed CRUD operations in HBase.
Developed Hive queries to process the data.
Monitoring, Performance tuning of Hadoop clusters, Screening Hadoop cluster job performances and capacity planning Monitor Hadoop cluster connectivity and security Manage and review Hadoop log files.
Load and transform large sets of structured, semi structured and unstructured data.
Understands and develops data access related to storing, retrieving or acting on housed data.
Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDD's.
Hands on experience in provisioning and managing multi-node Hadoop Clusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, Hadoop, HDFS, Map Reduce, Hive, HBase, Linux, Cluster Management

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Deerfield, IL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship