Hadoop Developer Resume
4.00/5 (Submit Your Rating)
Detroit, MI
SUMMARY
- Over 4+ years of experience in Development, Design, Integration and Presentation with Java along with Extensive years of Big Data /Hadoop experience in Hadoop ecosystem such as Hive, Pig, Sqoop, Zookeeper, HBase, SPARK, and AWS.
- Experience implementing big data projects using Cloudera.
- Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Expertise in Big Data architecture like Hadoop (Azure, Hortonworks, Cloudera) distributed system, Mongo DB, No SQL.
- Hands on experience loading the data into Spark RDD and performing in - memory data computation
- Strong understanding of Data Modeling and experience with Data Cleansing, Data Profiling and Data analysis.
- Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
- Extensive experience in Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
- Experience in analyzing data using Hive, Pig Latin, and custom MR programs in Java.
- Hands on experience in writing Spark SQL scripting.
- Sound knowledge in programming Spark using Scala.
- Good understanding in processing of real-time data using Spark.
- Experienced in Worked on No SQL databases - HBase, Cassandra & Mongo DB, database performance tuning & data modelling.
- Strong Experience in Front End Technologies like JSP, HTML5, jQuery, JavaScript, CSS3.
- Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
- Configured Hadoop clusters in OpenStack and Amazon Web Services (AWS)
- Experience in ETL (Data stage) analysis, designing, developing, testing, and implementing ETL processes including performance tuning and query optimizing of databases.
- Experience in extracting source data from Sequential files, XML files, Excel files, transforming and loading it into the target data warehouse.
- Strong experience with Java/J2EE technologies such as Core Java, JDBC, JSP, JSTL, HTML, JavaScript, JSON
- Experience developing iterative algorithms using Spark Streaming in Scala to build near real-time dashboards.
- Gaining optimum performance with data compression, region splits and by manually managing compaction in HBase.
- Working experience in Map Reduce programming model and Hadoop Distributed File System.
- Hands on experience on Unix/Linux environments, which included software installations/ upgrades, shell scripting for job automation and other maintenance activities.
- Hands-on experience in installing, configuring and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Storm, and Spark.
- Thorough knowledge and experience in SQL and PL/SQL concepts.
- Expertise in setting up standards and processes for Hadoop based application design and implementation.
PROFESSIONAL EXPERIENCE
HADOOP DEVELOPER
Confidential
Responsibilities:
- Experience in AWS platform and its features includingIAM, EC2 and Lambda and deployedAWS Lambda codefrom Amazon S3 buckets. Created a Lambda Deployment function and configured it to receive events from your S3 bucket.
- Responsible for building scalable distributed data solutions on Cloudera distributed Hadoop.
- Involved in using spark streaming and SPARK jobs for ongoing transactions of customers and Spark SQL to handle structured data in Hive.
- Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generating visualizations using Tableau.
- Create and maintain automated ETL processes with special focus on data flow, error recovery, and exception handling and reporting
- Conduct business/data flow modeling and generate applicable scenarios for the technology functionality testing team
- Involved in gathering requirements from client and estimating timeline for developing complex queries using HIVE and IMPALA for logistics application.
- Written PySpark code to calculate aggregate data like mean, Co-Variance, Standard Deviation and etc.
- Worked on analyzing Hadoop cluster and different Big Data Components including Pig, Hive, Storm, Spark, HBase, Kafka, Elastic Search, database and SQOOP.
- Integrated Kafka with Flume in sand box Environment using Kafka source and Kafka sink.
- Installed Hadoop, Map Reduce, HDFS, and developed multiple Map-Reduce jobs in PIG and Hive for data cleaning and preprocessing.
- Complete project documentation including diagram, data flow, status and usage reports, support, and escalation processes.
- Involved in Developing Insight Store data model for Cassandra which was utilized to store the transformed data
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Written UDF in Scala and used in sampling of large data sets.
- In data exploration stage used hive and impala to get some insights about the customer data.
- Successfully secured the Kafka cluster with Kerberos Implemented Kafka Security Features using SSL and without Kerberos. Further with more grain-fines Security set up Kerberos to have users and groups this will enable more advanced security features and Integrated Apache Kafka for data ingestion.
- Implementation of highly scalable and robust ETL processes using AWS (EMR, CloudWatch, IAM EC2, S3, Lambda Functions, DynamoDB).
- Used distinctive data formats while stacking the data into HDFS.
- Worked in AWS environment for development and deployment of custom Hadoop applications.
- Worked on NiFi workflows development for data ingestion from multiple sources. Involved in architecture and design discussions with the technical team and interface with other teams to create efficient and consistent Solutions.
- Involved in creating Shell scripts to simplify the execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move the data inside and outside of HDFS.
- Creating files and tuned the SQL queries in Hive utilizing HUE.
- Involved in converting Hive/Sql queries into Spark transformations using Spark RDD's.
- Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
- Using Apache Nifi in a Kerberos system to transfer data from relational databases like MySQL to HDFS.
- Written various key queries in elastic search for retrieval of data effectively.
- Expertized in implementing Spark using scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.
- Acted for bringing in data under HBase using HBase shell also HBase client API.
- Used Pig as ETL tool to do Transformations with joins and pre-aggregations before storing the data onto HDFS.
- Used Kafka to patch up a customer activity taking after pipeline as a course of action of steady appropriate subscribe supports.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review process and resolved technical problems
- Developed complete end to end Big-Data Processing in Hadoop Ecosystems.
Environment: Hadoop, HDFS, Hive, Sqoop, Oozie, Spark, Scala, Kafka, Python, Cloudera, Linux, Spark
HADOOP DEVELOPER
Confidential, Detroit, MI
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data
- Converted existing MapReduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL APIs.
- Worked on Big Data infrastructure for batch processing as well as real-time processing. Responsible for building scalable distributed data solutions using Hadoop.
- Developed Spark jobs and Hive Jobs to summarize and transform data
- Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
- Experienced in developing Spark scripts for data analysis in Scala.
- Used Spark-Streaming APIs to perform necessary transformations.
- Involved in converting Hive/SQL queries into Spark transformations using Spark SQL and Scala.
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and Spark.
- Converted existing MapReduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL APIs.
- Wrote new spark jobs in Scala to analyze the data of the customers and sales history.
- Involved in requirement analysis, design, coding, and implementation phases of the project.
- Used Spark API over Hadoop YARN to perform analytics on data in Hive.
- Experience in both SQLContext and Spark Session.
- Developed Scala based Spark applications for performing data cleansing, data aggregation, de-normalization and data preparation needed for machine learning and reporting teams to consume.
- Worked on troubleshooting spark application to make them more error tolerant.
- Involved in HDFS maintenance and loading of structured and unstructured data and imported data from mainframe dataset to HDFS using Sqoop and written the PySpark Script to process the HDFS data.
- Used Spark API over Hadoop YARN to perform analytics on data in Hive.
- Extensively worked on the core and Spark SQL modules of Spark.
- Involved in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
- Created partitioned tables and loaded data using both static partition and dynamic partition method.
- Implemented POC's on migrating to Spark-Streaming to process the live data.
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
- Ingested data from RDBMS and performed data transformations, and then export the transformed data to HDFS as per the business requirement.
- Used Impala to read, write and query the data in HDFS.
- Worked on troubleshooting spark application to make them more error tolerant.
- Stored the output files for export onto HDFS and later these files are picked up by downstream systems.
- Load the data into Spark RDD and do in memory data Computation to generate the Output response.
- Developed a data pipeline using Kafka, Spark, and Hive to ingest, transform and analyzing customer behavioral data.
- Developed real time data processing applications by using Scala implemented Apache Spark Streaming from various streaming sources like Kafka.
Environment: Hadoop 2.x, Spark Core, Spark SQL, Spark API Spark Streaming, Pyspark, Hive, Oozie, Amazon