Hadoop Developer Resume
Detroit, MI
PROFESSIONAL SUMMARY:
- Around 8 years of professional IT experience in Big data Environment, Hadoop Ecosystem and good experience in Spark, SQL, Java Development.
- Hands on experience across Hadoop Eco System that includes extensive experience in Big Data technologies like HDFS, MapReduce, YARN, Spark, Sqoop, Hive, Pig, Impala, Oozie, Oozie Coordinator, Zoo - Keeper and Apache Cassandra, HBase.
- Experience in using various tools like Sqoop, Flume, Kafka, NiFi, Pig to ingest structured, semi-structured and unstructured data into the cluster.
- Designing both time driven and data driven automated workflows using Oozie and used Zookeeper for cluster co-ordination.
- Experience in Hadoop cluster using Cloudera's CDH, Horton works HDP.
- Developed highly optimized Spark applications to perform various data cleansing, validation, transformation and summarization activities according to the requirement
- Data pipeline consists Spark, Hive and Sqoop and custom build Input Adapters to ingest, transform and analyze operational data.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames and Python.
- Experience in working with structured data using HiveQL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
- Expertise in writing Map-Reduce Jobs in Java, Python for processing large sets of structured, semi-structured and unstructured data sets and stores them in HDFS.
- Experience working with Python, UNIX and shell scripting.
- Experience in Extraction, Transformation and Loading (ETL) of data from multiple sources like Flat files and Databases.
- Good knowledge of cloud integration with AWS using Elastic Map Reduce (EMR), Simple Storage Service (S3), EC2, Redshift and Microsoft Azure.
- Experience with complete Software Development Life Cycle (SDLC) process which includes Requirement Gathering, Analysis, Designing, Developing, Testing, Implementing and Documenting.
- Worked with waterfall and Agile methodologies.
- Good team player with excellent communication skills with strong attitude towards learning new technologies.
- Spark & Real Time Streaming
- Hands on Experience in Spark architecture and its integrations like Spark SQL, Data Frames and Datasets APIs.
- Worked on Spark for enhancing the executions of current processing in Hadoop utilizing Spark Context, Spark SQL, Data Frames and RDD s.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL and Python.
- Hands on experience Using Hive Tables by Spark, performing transformations and Creating Data Frames on Hive tables using Spark.
- Used Spark-Structured-Streaming to perform necessary transformations.
- Expertise in converting Map Reduce programs into Spark transformations using Spark RDD's
TECHNICAL SKILLS:
HADOOP: HDFS, MapReduce, Hive, beeline, Sqoop, Flume, Oozie, Impala, pig, Kafka, Zookeeper, NiFi, Cloudera Manager, Horton Works
Spark Components: Spark Core, Spark SQL (Data Frames and Dataset), Scala, Python.
Programming Languages: Core Java, Scala, Shell, Hive-QL, Python
Web Technologies: HTML, JQuery, Ajax, CSS, JSON, JavaScript.
Operating Systems: Linux, Ubuntu, Windows 10/8/7
Databases: Oracle, MySQL, SQL Server, NoSQL Databases Hbase, Cassandra, MongoDB
Cloud: AWS Cloud Formation, Azure Version
Controls and Tools: GIT, Maven, SBT, CBT
Methodologies: Agile, Waterfall IDES & Command Line
Tools: Eclipse, Net Beans, IntelliJ
WORK EXPERIENCE:
Confidential, Detroit, MI
Hadoop Developer
Responsibilities:
- worked with product owners, Designers, QA and other engineers in Agile development environment to deliver timely solutions to as per customer requirements.
- Transferring data from different data sources into HDFS systems using Kafka producers, consumers and Kafka brokers.
- Developed highly optimized Spark applications to perform various data cleansing, validation, transformation and summarization activities according to the requirement
- Data pipeline consists Spark, Hive and Sqoop and custom build Input Adapters to ingest, transform and analyze operational data.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames and Python.
- Used Oozie for automating the end - to-end data pipelines and Oozie coordinators for scheduling the workflows.
- Involved in creating Hive tables, loading data and writing hive queries, views and worked on them using Hive QL.
- Performed Optimizations of Hive Queries using Map side joins, dynamic partitions and Bucketing. Applied Hive queries to perform data analysis on HBase using the serde tables in meeting the data requirements for the downstream applications.
- Responsible for executing hive queries using Hive Command Line, Web GUI HUE and Impala to read, write and query the data into HBase.
- Implemented MapReduce secondary sorting to get better performance for sorting results in MapReduce programs.
- Load and transform large sets of structured, semi structured that includes Avro, sequence files. worked on migration of all existed jobs to Spark, to get performance and decrease time of execution.
- Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
- Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
- Experience with ELK Stack in building quick search and visualization capability for data.
- Experience with different data formats like Json, Avro, parquet, ORC formats and compressions like snappy & bzip.
- Coordinated with the testing team for bug fixes and created documentation for recorded data, agent usage and release cycle notes.
Environment: Hadoop, Big Data, HDFS, Scala, Python, Oozie, Hive, HBase, NiFi, Impala, Spark, AWS, Linux.
Confidential
Hadoop Developer
Responsibilities:
- Developed an EDW solution, which is a cloud based EDW and Data Lake that supports Data asset management, Data Integration, and continuous data analytic discovery workloads.
- Developed and implemented real-time data pipelines with Spark Streaming, Kafka, and Cassandra to replace existing lambda architecture without losing the fault-tolerant capabilities of the existing architecture.
- Created a Spark Streaming application to consume real-time data from Kafka sources and applied real-time data analysis models that we can update on new data in the stream as it arrives.
- Worked on importing, transforming large sets of structured semi-structured and unstructured data.
- Used Spark-Structured-Streaming to perform necessary transformations and data model which gets the data from Kafka in real time and Persists into HDFS.
- Implemented the workflows using the Apache Oozie framework to automate tasks. Used Zookeeper to coordinate cluster services.
- Created various hive external tables, staging tables and joined the tables as per the requirement.
- Implemented static Partitioning, Dynamic partitioning and Bucketing in Hive using internal and external table. Created Map side Join, Parallel Execution for optimizing the Hive queries.
- Developed and implemented hive and spark custom UDFs involving date Transformations such as date formatting and age calculations as per business requirements.
- Written Programs in Spark using Scala and Python for Data quality check.
- Written transformations and actions on Data Frames, used Spark SQL on data frames to access hive tables into spark for faster processing of data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Used Spark optimizations techniques like Cache/Refresh tables, broadcasting variables, Coalesce/Repartitioning, increasing memory overhead limits, handling parallelism and modifying the spark default configuration variables for performance tuning.
- Performed various benchmarking steps to optimize the performance of Spark jobs and thus improve the overall processing. worked in Agile environment in delivering the agreed user stories within the sprint time.
Environment: Hadoop, HDFS, Hive, Sqoop, Oozie, Spark, Scala, Kafka, Python, Cloudera, Linux.
Confidential
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop cluster environment with Horton Works distribution.
- Used Sqoop to load the data from relational databases.
- Involved in converting Hive/SQL queries into spark transformations using Spark RDD’s. worked with CSV, Jason, Avro and Parquet file formats.
- Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service(S3).
- Worked on Kafka to collect and load the data on Hadoop file systems.
- Used Hive to form an abstraction on top of structured data resides in HDFS and implemented Partitions, Buckets on HIVE tables.
- Developed and implemented real-time data pipelines with Spark Streaming.
- Designed, developed data integration programs in a Hadoop environment with NoSQL data store HBase for data access and analysis.
- Worked with Python, to develop analytical jobs using PySpark API of spark.
- Using Job management scheduler apache Oozie to execute the workflow.
- Using Ambari to monitor node’s health, status of the jobs and to run the analytics jobs in Hadoop clusters.
- Experience with pyspark for using spark libraries by using python scripting for data analysis.
- Worked on Tableau to build customized interactive reports, worksheets, and dashboards.
- Involved in performance tuning of spark jobs using Cache and by utilizing complete advantage of cluster environment.
Environment: Hadoop, Spark, Scala, Python, Kafka, Hive, Sqoop, Pyspark, Ambari, Oozie, HBase, Tableau, Jenkins, HortonWorks
Confidential
Java Developer
Responsibilities:
- Designed and developed Web Services using Java/J2EE in WebLogic environment.
- Developed web pages using Java Servlet, JSP, CSS, Java Script, DHTML, and HTML.
- Added extensive Struts validation.
- Wrote Ant scripts to build and deploy the application.
- Involve in the Analysis, Design, and Development and Unit testing of business requirements.
- Developed business logic in JAVA/J2EE technology.
- Implemented business logic and generated WSDL for those web services using SOAP. worked on Developing JSP pages
- Implemented Struts Framework.
- Developed Business Logic using Java/J2EE.
- Modified Stored Procedures in Oracle Database.
- Developed the application using Spring Web MVC framework.
- Worked with Spring Configuration files to add new content to the website.
- Worked on the Spring DAO module and ORM using Hibernate.
- Used Hibernate Template and Hibernate Dao Support for Spring-Hibernate Communication.
- Configured Association Mappings such as one-one and one-many in Hibernate
- Worked with JavaScript calls as the Search is triggered through JS calls when a Search key is entered in the Search window worked on analyzing other Search engines to make use of best practices.
- Collaborated with the Business team to fix defects. worked on XML, XSL and XHTML files.
- As part of the team to develop and maintain an advanced search engine, would be able to attain
Environment: Java 1.6, J2EE, Eclipse SDK 3.3.2, Java Spring 3.x, jQuery, Oracle 10i, Hibernate, JPA, Json, Apache Ivy, SQL, stored procedures, Shell Scripting, XML