We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Sfo, CA

SUMMARY:

  • Around 10 years of professional experience in IT, this includes Analysis, Design, Coding, Testing, Implementation and Training in Java and Big Data Technologies working with Apache Hadoop Eco - components, Spark streaming and Amazon Web services(AWS)
  • Progressive experience in all phases of iterative Software Development Life Cycle (SDLC)/Agile
  • Actively involved in Requirements Gathering, Analysis, Development, Unit Testing and Integration Testing.
  • Extensive experience on Hadoop ecosystem components like Hadoop, Map Reduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper and Flume.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Yarn, Map Reduce, Pig, Hive, HBase, Zookeeper, Oozie and Flume
  • Hands on experience with AWS components like EC2, EMR, S3, and ElasticSearch.
  • Expertise in writing Hadoop Jobs for analyzing data using Spark, Hive, Pig MapReduce, Hive.
  • Good understanding of HDFS Designs, Daemons, HDFS High Availability (HA).
  • Good understanding and working experience on Hadoop Distributions like Cloudera and Hortonworks.
  • Good Knowledge in creating event-processing data pipelines using flume, Kafka and Storm.
  • Expertise in data transformation & analysis using SPARK, PIG, HIVE
  • Build and configured Apache TEZ on Hive and PIG to achieve better responsive time while running MR Jobs.
  • Experience in importing and exporting Terabytes of data using Sqoop from HDFS to Relational Database Systems(RDBMS) and vice-versa
  • Extending Hive and Pig core functionality by writing custom UDFs, UDTF and UDAFs.
  • Experience in analyzing large-scale data to identify new analytics, insights, trends and relationships with a strong focus on data clustering.
  • Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, Avro Parquet files.
  • Hands on experience in AVRO and Parquet file format, Dynamic Partitions, Bucketing for best Practice and Performance improvement
  • Developed Spark SQL programs for handling different data sets for better performance.
  • Good knowledge of creating event-processing data using Spark Streaming .
  • Experience of semi-structured data processing (XML,JSON, and CSV) in Hive/Impala
  • Good working experience on Hadoop Cluster architecture and monitoring the cluster. In-depth understanding of Data Structure and Algorithms.
  • Experience in using Zookeeper and Oozie operational services to coordinate clusters and scheduling workflows
  • Excellent understanding and knowledge of NOSQL databases like HBase, and Mongo DB .
  • Experience in implementing standards and processes for Hadoop based application design and implementation.
  • Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration
  • Extensive experience working in JavaScript for client side validations and implemented AJAX with JavaScript for reducing data transfer overhead between user and server.
  • Extensive experience working in Oracle, SQL Server and My SQL database. Hands on experience in application development using Java and RDBMS

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, Map Reduce, Spark Core, Spark Streaming, Spark SQL, Hive, Tez, Pig, Sqoop, Flume, Kafka, Oozie, NiFi and ZooKeeper, Docker.

AWS Components: EC2, S3, RDS, Redshift, EMR, DynamoDB, Lambda, RDS, SNS, SQS

No SQL Databases: HBase, Cassandra, MongoDB

Languages: C, C++, Java, Scala, J2EE, Python, PL/SQL, Pig Latin, HiveQL, UNIX shell scripts

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets EJB, JSF, JQuery

Frameworks: MVC, Struts, Spring, Hibernate

Operating Systems: Sun Solaris, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata

Tools: and IDE: Eclipse, NetBeans, Toad, Maven, SBT, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

PROFESSIONAL EXPERIENCE:

Confidential, SFO, CA

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Spark.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common word2vec data model, which gets the data from Kafka in near real time and Persists into Cassandra.
  • Operating the cluster on AWS by using EC2, EMR, S3 and Elastic Search.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Migrated existing MapReduce programs to Spark using Scala and Python
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Build Spark 1.6.1 source code over yarn for matching of production Cloudera (CDH 5.7) Hadoop 2.7 version.
  • Developed Scala scripts, UDFFs using both Data frames in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Implemented Proof of Concepts on Hadoop stack and different big data analytic tools, Migration from different databases
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Optimizing of existing word2vec algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's in development of Chatbot using OpenNLP and Word2Vec.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Hands on experience in AWS Cloud in various AWS services such as Red shift cluster, Route 53 domain configuration.
  • On demand secure EMR launcher with custom spark submit steps using S3 Event, SNS, KMS and Lambda function.
  • Extensive knowledge of working on NiFi .
  • Used AWS services like EC2 and S3 for small data sets.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Created Hive tables and load the data using Sqoop and worked on them using Hive QL
  • Responsible for developing custom UDFs, UDAFs and UDTFs in Pig and Hive.
  • Optimizing the Hive Queries using the various files format like JSON, AVRO, ORC, and Parquet
  • Implemented Spark SQL to connect to Hive to read the data and distributed processing to make highly scalable
  • Analyze the tweets json data using hive SerDe API to deserialize and convert into readable format
  • Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow to run multiple Spark Jobs in sequence for processing data
  • Build Tez source code and configured on Hive and achieved very good responsive time ( <1 min) while running the huge Hive queries which used to take longer time( > 30 mins)
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Processed application Weblogs using flume and load them into Hive for analyzing the data
  • Generated different types of reports using HiveQL for business to analyze the data feed from sources
  • Implemented RESTful Web Services to interact with Oracle/Cassandra to store/retrieve the data.
  • Generated detailed design documentation for the source-to-target transformations.
  • Wrote UNIX scripts to monitor data load/transformation.
  • Involved in planning process of iterations under the Agile Scrum methodology

Environment: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Spark , Oozie, Zookeeper, AWS, RDBMS/DB, MySQL, CSV, AVRO data files.

Confidential, Carrolton, TX

Hadoop/Spark Developer

Responsibilities:

  • Involved in Design and Development of technical specifications.
  • Developed multiple Spark jobs in PySpark for data cleaning and preprocessing.
  • Analyzed large data sets by running Hive queries and Pig scripts.
  • Involved in creating Hive tables, and loading and analyzing data using Hive queries.
  • Developed simple/complex MapReduce jobs using Hive and Pig.
  • Loaded and transformed large sets of structured, semi structured and unstructured data.
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Worked with application teams to install Operating Systems, Hadoop updates, patches, and version upgrades as required.
  • Responsible for managing data from multiple data sources.
  • Experienced in running Hadoop streaming jobs to process terabytes of XML format data.
  • Experience in optimization of MapReduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization.
  • Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
  • Developed Merge jobs in Python to extract and load data from MySQL database to HDFS
  • Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to load data files.
  • Developed Python scripts to monitor health of MongoDB databases and perform ad-hoc backups using Mongo dump and Mongo restore.
  • Wrote the PIG UDF in for converting Date format and time stamp formats from the unstructured files to required date formats and processed the same.
  • Created 30 buckets for each Hive table based on clustering by client Id for better performance (optimization) while updating the tables.
  • Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
  • Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
  • Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or Copy to Local.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
  • Expert in importing and exporting data into HDFS using Sqoop and Flume.
  • Experience in using Sqoop to migrate data back and forth from HDFS and MySQL or Oracle and deployed Hive and HBase integration to perform OLAP operations on HBase data.
  • Written shell scripts to pull the data from Tumbleweed server to cornerstone staging area.
  • Closely worked with Hadoop security team and infrastructure team to implement security.
  • Implemented authentication and authorization service using Kerberos authentication Protocol

Environment: Hadoop, MapReduce, Hive, pig, spring batch, Scala, Sqoop, Bash Scripting, Spark RDD, Spark Sql.

Confidential, Bloomington, IL

Hadoop Developer

Responsibilities:

  • Gathering data from multiple sources like Teradata, Oracle and SQL Server using Sqoop and loading to HDFS
  • Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
  • Responsible for cleansing and validating data.
  • Responsible for writing Map-Reduce job which joins the incoming slices of data and pick only the fields needed for further processing.
  • Finding the right join conditions and create datasets conducive to data analysis
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and also written Hive UDFs.
  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Devised procedures that solve complex business problems with due considerations for hardware/software capacity and limitations, operating times and desired results.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Worked hands on with ETL process.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior like shopping enthusiasts, travelers, music lovers etc.
  • Wrote REST Web services to expose the business methods to external services.
  • Exported the patterns analyzed back into Teradata using Sqoop.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Installed Oozie workflow engine to run multiple Hive.
  • Developed Hive queries to process the data and generate the data cubes for visualizing

Environment: Hadoop, MapReduce, HDFS, Hive, Flume, Sqoop, Cloudera, Oozie, UNIX.

Confidential, Pittsburgh, PA

Hadoop Developer

Responsibilities:

  • Responsible for loading the customer's data and event logs from Oracle database, Teradata into HDFS using Sqoop
  • End-to-end performance tuning of Hadoop clusters and Hadoop MapReduce routines against very large data sets.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce,
  • Loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
  • Implementing MapReduce programs to analyze large datasets in warehouse for business intelligence
  • Written the Spouts and Bolts after collecting the real stream customer data from Kafka broker to process and store into HBASE.
  • Analyze the log files and process through Flume
  • Experience in optimization of MapReduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization.
  • Developed HQL queries to implement the select, insert, update and operations to the database by creating
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Developed simple to complex Map/Reduce jobs using Java, and scripts using Hive and Pig.
  • Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) for data ingestion
  • Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
  • Experienced on loading and transforming of large sets of structured and semi structured data.
  • Managing and Reviewing Hadoop Log Files, deploy and Maintaining Hadoop Cluster.
  • Export filtered data into HBase for fast query.
  • Involved in creating Hive tables, loading with data and writing Hive queries.
  • Created data-models for customer data using the Cassandra Query Language.
  • Ran many performance tests using the Cassandra-stress tool in order to measure and improve the read and
  • Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce)
  • And move the data files within and outside of HDFS.
  • Queried and analyzed data from Datastax Cassandra for quick searching, sorting and grouping.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop

Environment: Apache Hadoop (Cloudera), Hbase, Hive, Pig, Map Reduce, Sqoop, Oozie, Eclipse, Java

Confidential, Jersey City NJ

Full Stack .Net Developer

Roles and Responsibility:

  • Involved in Analysis, Design, Coding, Performance Testingand Maintenance phase of the project life cycle.
  • Experienced in developing and consuming Web Services and n-tier web applications using ASP.NET4.5, C# using Visual studio 2012/2013.
  • Hands on experience in developing, debugging n-tier applications using C#, HTML 5, XML, CSS3on MVC5& Facets Framework.
  • Analyze and fixed defects related to cross browser on the application that is developed based on AngularJS.
  • Developed Business logic components in C#, VB Script, LINQ to Objects and Data Access Layer components using DLINQ and C# for web module.
  • Involved in creating Databases, tables for the given database schemas using SQL Server 2012 R2.
  • Responsible for writing the ComplexSQL Queries, Stored Procedures, Triggers and Viewsand handling the major Oracle Database.
  • Responsible for writing the complicated Business Logicsand alsoreviewing the logic written by others.
  • Developed the Master Pages and applied that Master Pages to all Content Pages using ASP.Net 4.5.
  • Worked in project migration from VB6 to C# considering the old applications.
  • Developed the Validation for the Client Side in ASP .NET Web Pages using Java Script andASP .NET Validation Controls.
  • Worked closely with front-end engineer to design and tweak RESTful API used by the frontend.
  • Used Unified Modeling Language (UML) technologies for a complete view of the application including Class Diagrams, Sequence Diagrams, Activity Diagrams using VISIO 2012 and UI Wireframes.
  • Developed the view models and controller actions method to fetch the data from the back end RESTful GET and POST API services.
  • Participated in a fully scalable Responsive application using WEB API, C#, Entity Framework.
  • JQuery validation and MVC 5.0 unobtrusive validations were used to validate form fields and provide custom requirements/error messages through C# attributes in the Model.
  • Responsible for writing LAMBDA Expressionsand creating the Delegates, Expression Trees.
  • Created MVVC controller, models and views according to the requirement of client.
  • Implemented integration of third party services using Windows Communication Foundation (WCF).
  • Used Data Contract as the standard mechanism in the WCF for serializing .NET object types into XML.
  • Used Typescript in the implementation of a Sing page application with AngularJS
  • Worked on converting the Database tables to the Model Classes Using Entity Frame work 6.5 and efficient with both code first and Data base first approach.
  • Monitored different graphs like transaction response time and analyzed server performance status, hits per second, throughput, windows resources and database server resources etc.
  • Involved in Testing process, the complete project was developed under the Test Driven Development(TDD) Methodology. and used JIRA for project management and also bug tracking.
  • Implemented HTTP protocol and SSL to secure the information between Web Service and Client. Followed EDI protocol in Data Exchange.
  • Performed Multithreading programming to improve the application performance.
  • Involved in maintenance and enhancements of an application using Confidential .NET Framework 4.0, C#.NET, ASP.NET, LINQ, WCF, AJAX, JavaScript, Type script, jQuery, XML, WEBSERVICES.
  • Attended meetings with the client to know what they exactly need.

Environment: Visual Studio 2012/2013, ASP.NET 4.5, C#.NET, Angular JS 2.0, ADO.net, JQuery, SQL, MS SQL Server, XML, Windows Server 2008,,Azure, TFS, VB.NET, .NET Framework 4.0, MVC 3.0/4.0, SharePoint 2013,, SQL, WCF, SSIS, SSRS,Sitecore CMS, IIS, JIRA.

Confidential

Jr Java Developer

Responsibilities:

  • Involved in specification analysis and identifying the requirements.
  • Designed the presentation layer by developing the jsp pages for the modules
  • Developed controllers and JavaBeans encapsulating the business logic
  • Developed classes to interface with underlying web services layer
  • Used patterns including MVC, DAO, DTO, Front Controller, Service Locator and Business Delegate.
  • Involved in building PL\SQL queries and stored procedures for Database operations.
  • Used Jasper Reports to provide print preview of Financial Reports and Monthly Statements.
  • Used JMeter to carry out performance tests on external web service calls, database connections and other dynamic resources

Environment: Java1.4, J2EE 1.4 Servlet, JSP, JDBC, XML, ANT, Apache Tomcat 5.0, Oracle 8i, JUnit, PL\SQL, UML, NetBeans.

We'd love your feedback!