We provide IT Staff Augmentation Services!

Hadoop Architect/ Developer Resume

3.00/5 (Submit Your Rating)

Redmond, WA

SUMMARY

  • Strong knowledge in stream processing pipelines such as Kafka, RabbitMQ, Apache Storm, Apache Spark Streaming.
  • Worked on the Spark Core, Spark SQL and Spark Streaming modules of Spark extensively.
  • Worked on Apache Storm, RabbitMQ and Filewatcher extensively.
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Experienced on major Hadoop ecosystem projects such as PIG, HIVE and HBASE.
  • Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
  • Good working experience using Sqoop to import data into HDFS from RDBMS and vice - versa.
  • Good knowledge in using job scheduling and monitoring tools like Oozie and ZooKeeper.
  • Experience in Hadoop administration activities such as installation and configuration of clusters using Apache and Hortonworks.
  • Understanding in AWS Cloud platform and its features which include EC2, VPC, EBS, AMI, SNS, RDS, EBS, Cloud Watch, Cloud Trail, Cloud Formation, AWS Config, Auto scaling, Cloud Front, IAM, S3, and Route53.
  • Designed well-architected plans to migrate on premise servers to AWS and provided initial support to teams using those resources.
  • Extensive knowledge in creation of Cloud watch alarms to notify users, using metrics to monitor system performance cost of resources in AWS.
  • Experience with designing and configuring secure Virtual Private Cloud (VPC) through private and public networks in AWS by creating various subnets, routing table, Network ACL, NAT gateways.
  • Working knowledge in EC2 by creating snapshots, volumes and elastic IPs, security groups for public and private instances.
  • Provided authenticated access to AWS resources using MFA (Multi-Factor Authentication) and managed users using IAM policies, roles.
  • Hands on experience in solving software design issues by applying design patterns including Singleton Pattern, Business Delegator Pattern, Controller Pattern, MVC Pattern, Factory Pattern, Abstract Factory Pattern, DAO Pattern and Template Pattern.
  • Working knowledge of relational database such as Oracle.
  • Strong experience in database design, writing complex SQL Queries and Stored Procedures.
  • Experience in Building, Deploying and Integrating with Ant, Maven, Git.
  • Experience in development of logging standards and mechanism based on Log4J.
  • Experience in designing, deploying virtual networks and upgrading systems, including hardware, software, networks, databases, servers and peripheral equipment.
  • Experience in Virtualization technologies like installing, configuring, VMware vSphere. Creation, management, administration and maintenance of virtual servers and clients.
  • Strong work ethic with desire to succeed and make significant contributions to the organization.
  • Strong problem-solving skills, good communication, interpersonal skills and a good team player.
  • Have the motivation to take independent responsibility as well as ability to contribute and be a productive team member.

TECHNICAL SKILLS

  • Hadoop
  • MapReduce V2 yarn
  • HDFS
  • Hive
  • Pig
  • Java
  • SQL
  • Hortonworks
  • Sqoop
  • Oracle
  • MySQL
  • Tableau
  • Talend
  • Elastic search
  • Oozie
  • Spark Core
  • Spark SQL
  • Spark Streaming
  • Kafka
  • Flume
  • Eclipse

PROFESSIONAL EXPERIENCE

Confidential, Redmond WA

Hadoop Architect/ Developer

Responsibilities:

  • Analyze and define researcher's strategy and determine system architecture and requirement to achieve goals.
  • Developed multiple Kafka Producers and Consumers from as per the software requirement specifications.
  • Used Kafka for log accumulation like gathering physical log documents off servers and places them in a focal spot like HDFS for handling.
  • Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
  • Used various spark Transformations and Actions for cleansing the input data.
  • Developed shell scripts to generate the hive create statements from the data and load the data into the table.
  • Wrote Map Reduce jobs using Java API and Pig Latin
  • Optimized Hive QL/ pig scripts by using execution engine like Tez, Spark.
  • Involved in writing custom Map-Reduce programs using java API for data processing.
  • Integrated Maven build and designed workflows to automate the build and deploy process.
  • Involved in developing a linear regression model to predict a continuous measurement for improving the observation on data developed using spark with Scala API.
  • Worked extensively on spark and MLlib to develop a regression model for logistic information.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
  • The hive tables are created as per requirement were Internal or External tables defined with appropriate static, dynamic partitions and bucketing, intended for efficiency.
  • Load and transform large sets of structured, semi structured data using hive.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
  • Develop Hive queries for the analysts.
  • Automated hourly and daily transaction reports using Talend open studio.
  • Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs
  • Experienced in using the spark application master to monitor the spark jobs and capture the logs for the spark jobs.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Involved in making code changes for a module in work station simulation for processing across the cluster using spark-submit.
  • Involved in performing the analytics and visualization for the data from the logs and estimate the error rate and study the probability of future errors using regressing models.
  • Cluster coordination services through Zookeeper

Environment: Hadoop, Hive, HDFS, Spark, Spark-SQL, KAFKA, Java, Scala, Pig, Hive, Sqoop, Oozie, Shell Scripting, SQL Talend, Spark, HBase, Hotonworks.

Confidential, College Park MD

AWS Architect

Responsibilities:

  • Migration of Existing Application to AWS cloud.
  • Responsible for designing, building, and maintaining multiple AWS infrastructures to support multiple applications.
  • Managing IAM accounts (with MFA) and IAM policies to meet security audit & compliance requirements.
  • Using Docker containers for local and cloud-based development.
  • Execute Proof of Concepts on behalf of configuration management, CI/CD (continuous integration / continuous deployment) practice, assessing new products and methods, developing and implementing appropriate practices across multiple development environments.
  • Responsible for the day-to-day operations of all in-house developed, open source, and commercial DevOps tooling owned by the team (Ensuring system availability, performance, capacity, and monitoring through proper response to incidents, events, and problems)
  • Utilizing Cloud Watch to monitor resources such as EC2, EBS, ELB, RDS, and S3 etc.
  • Designing and configuring the AWS Simple Notification Service (SNS) and Simple Email Service (SES) architecture of the solution and working with a client.
  • Deployed JSON template to create a stack in Cloud Formation which includes services like Amazon EC2, Amazon S3, Amazon RDS, Amazon Elastic Load Balancing, Amazon VPC, and other services of the AWS infrastructure.
  • Implemented AWS infrastructure security designs, including AWS Shield, Application Load Balancers, Cloud Formations, Route53, Elastic Beanstalk, etc.
  • Work as part of a hands-on team to collaborate on designs, implementation, tuning and support of our security systems at various layers.
  • Develop or modify SQL queries and stored procedures to meet business requirements and achieve desired performance.

Environment: AWS (EC2, Cloud Formation, VPC, RDS, ELB, S3, Route 53, Elastic Bean Stalk, SNS, SES, Cloud Watch).

Confidential, Bellevue WA

Hadoop Admin/ Developer

Responsibilities:

  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Developed and maintained real time stream processing pipeline that transfers and process data using Apache Storm.
  • Developed simple and complex MapReduce programs in Java for Data Analysis.
  • Integrated the hive warehouse with HBase.
  • Worked extensively on Hive and PIG.
  • Worked on large sets of structured, semi-structured and unstructured data.
  • Developed PIG Latin scripts to play with the data.
  • Involved in creating Hive tables, loading with data and writing hive queries
  • Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
  • Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behaviour.

Environment: Hadoop, MapReduce, HDFS, RHEL, Hive, Pig, Java, SQL, HDP 2.4, Sqoop, Oozie, Java (JDK 1.7), Eclipse

Confidential, New York NY

Hadoop Architect/ Developer

Responsibilities:

  • Developed and maintained Big Data pipeline that transfers and process data using Apache Spark.
  • Responsible for migrating Hadoop to Spark frameworks, in-memory distributed computing for real time data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, Spark and loaded data into HDFS.
  • Loaded data into Spark RDD and performed in memory data Computation to generate the Output response.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
  • Migrated data from Oracle, MySQL into HDFS in using Sqoop and importing various formats of flat files into HDFS.
  • Designed Batch ingestion components using Sqoop scripts, data integration and processing components using shell scripts, pig scripts, hive scripts.
  • Proposed an automated system using Shell script to sqoop the job.
  • Worked in Agile development approach.
  • Worked with Flume from Server to HDFS.
  • Implemented Spark streaming framework that processes the data for Kafka and perform analytics on top of it.
  • Developed a data pipeline for data processing using Spark SQL API.
  • Created the estimates and defined the sprint stages.
  • Developed a strategy for Full load and incremental load using Sqoop.
  • Worked on Hive queries to categorize data of different claims.
  • Written customized Hive UDFs in Java where the functionality is too complex.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Maintained System integrity of all sub-components (primarily HDFS, MR, HBase and Hive).
  • Monitored System health and logs and respond accordingly to any warning or failure conditions.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Environment: Hadoop, MapReduce V2 yarn, HDFS, Hive, Pig, Java, SQL, Hortonworks, Sqoop, Oracle, MySQL, Tableau, Talend, Elastic search, Oozie, Spark Core, Spark SQL, Spark Streaming, Kafka, Flume, Eclipse

Confidential, Manassas VA

Hadoop Developer

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
  • Involved in loading data from UNIX file system to HDFS using Flume and Kettle and HDFS API.
  • Wrote MapReduce jobs to discover trends in data usage by users.
  • Involved in managing and reviewing Hadoop log files.
  • Involved in running Hadoop streaming jobs to process terabytes of text data.
  • Loaded and transformed large sets of structured, semi structured and unstructured data with map reduce and pig.
  • Wrote pig UDF's.
  • Developed HIVE queries for the analysts.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Exported the result set from HIVE to MySQL using Kettle (Pentaho data-integration tool).
  • Used Zookeeper for various types of centralized configurations.
  • Designed and Developed ETL jobs using Talend BigData ETL.
  • Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map
  • Automated all the jobs starting from pulling the Data from different Data Sources like MySQL and pushing the result dataset to Hadoop Distributed File System and running MR jobs and PIG/Hive using Kettle and Oozie (Work Flow management).
  • Maintained System integrity of all sub-components (primarily HDFS, MR and Flume).
  • Wrote unit test cases using MR Unit.
  • Monitored System health and logs and respond accordingly to any warning or failure conditions.

Environment: Hadoop (Cloudera), HDFS, Map Reduce, Hive, Pig, Sqoop, WebSphere, Struts, Hibernate, spring, Oozie, REST Web Services, AWS, Solaris, DB2, UNIX Shell Scripting, Kettle.

Confidential, Boise ID

Java Developer

Responsibilities:

  • Involved in development of business domain concepts into Use Cases, Sequence Diagrams, Class Diagrams, Component Diagrams and Implementation Diagrams.
  • Implemented various J2EE Design Patterns such as Model-View-Controller, Data Access Object, Business Delegate and Transfer Object.
  • Responsible for analysis and design of the application based on MVC Architecture, using open source Struts Framework.
  • Involved in configuring Struts, Tiles and developing the configuration files.
  • Developed Struts Action classes and Validation classes using Struts controller component and Struts validation framework.
  • Developed and deployed UI layer logics using JSP, XML, JavaScript, HTML /DHTML.
  • Used Spring Framework and integrated it with Struts.
  • Involved in Configuring web.xml and struts-config.xml according to the struts framework.
  • Designed a light weight model for the product using Inversion of Control principle and implemented it successfully using Spring IOC Container.
  • Used transaction interceptor provided by Spring for declarative Transaction Management.
  • Provided connections using JDBC to the database and developed SQL queries to manipulate the data.
  • Developed DAO using spring JDBC Template to run performance intensive queries.
  • Developed ANT script for auto generation and deployment of the web service.
  • Wrote stored procedure and used JAVA APIs to call these procedures.
  • Developed various test cases such as unit tests, mock tests, and integration tests using the JUNIT.
  • Experienced in writing Stored Procedures, Functions and Packages.
  • Used log4j to perform logging in the applications.

Environment: Java, J2EE, Struts MVC, Tiles, JDBC, JSP, JavaScript, HTML, Spring IOC, Spring AOP, JAX-WS, Ant, Web sphere Application Server, Oracle, JUNIT and Log4j, Eclipse

We'd love your feedback!