Sr. Hadoop Developer Resume
Herndon, VA
PROFESSIONAL SUMMARY:
- Experienced Hadoop developer with over 8+ years of experience in programming and hands on experience of 4+ years inBig Data environment.
- In depth experience and good knowledge in using Hadoop ecosystem tools like MapReduce, HDFS, Pig, Hive, Kafka, Yarn, Sqoop, Storm, Spark, Oozie, and Zookeeper.
- Excellent understanding and extensive knowledge ofHadooparchitecture and various ecosystem components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode and MapReduce programming paradigm.
- Good usage of Apache Hadoopalong enterprise version of Cloudera and Hortonworks. Good Knowledge on MAPR distribution & Amazon’s EMR.
- Good knowledge of Data modeling use case design and Object - oriented concepts.
- Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure ofHadoop Cluster.
- Good knowledge on spark components like Spark SQL, MLib, Spark Streaming and GraphX,
- Extensively worked on Spark Streaming and Apache Kafka to fetch live stream data.
- Experience in converting Hive/SQL queries into RDD transformations using Apache Spark, Scala and Python.
- Implemented Dynamic Partitions and Buckets in HIVE for efficient data access.
- Experience in data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Involved in integrating hive queries into spark environment using SparkSql.
- Hands on experience in performing real time analytics on big data using HBase and Cassandra in Kubernetes & Hadoop clusters.
- Experience in using Flume to stream data into HDFS.
- Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Good knowledge in developing data pipeline using Flume, Sqoop, and Pig to extract the data from weblogs and store in HDFS.
- Created User Defined Functions (UDFs), User Defined Aggregated Functions (UDAFs) in PIG and Hive.
- Good knowledge in using job scheduling and monitoring tools like Oozie and ZooKeeper.
- Hands on experience working on NoSQL databases including Hbase, Cassandra, MongoDB and its integration with Hadoop cluster & Kubernetes cluster.
- Proficient with Cluster management and configuring Cassandra Database.
- Extensive experience in developing Pig Latin Scripts and using Hive Query Language for data analytics.
- Good working experience on different file formats (PARQUET, TEXTFILE, AVRO, ORC) and different compression codecs (GZIP, SNAPPY, LZO).
- Valuable experience on practical implementation of cloud-specific AWS technologies including IAM, Amazon Cloud Services like Elastic Compute Cloud (EC2), ElastiCache, Simple Storage Services (S3), Cloud Formation, Virtual Private Cloud (VPC), Route 53, Lambda, EBS.
- Build AWS secured solutions by creating VPC with private and public subnets.
- Expertise in configuring Relational Database Service.
- Worked extensively in configuring Auto scaling for high Availability.
- Knowledge of data warehousing and ETL tools like Informatica, Talend and Pentaho.
- Experience working with JAVA J2EE, JDBC, ODBC, JSP, Java Eclipse, Java Beans, EJB, Servlets.
- Expert in developing web page interfaces using JSP, Java Swings, and HTML scripting languages.
- Experience working with Spring and Hibernates frameworks for JAVA.
- Experience in using IDEs like Eclipse, NetBeans andIntellij.
- Proficient using version control tools like GIT, VSS, SVN and PVCS.
- Experience with web-based UI development using jquery UI, jquery, CSS, HTML, HTML5, XHTML and JavaScript.
- Development experience in DBMS like Oracle, MS SQL Server, Teradata and MYSQL.
- Developed stored procedures and queries using PL/SQL.
- Hands on Experience with best practices of Web services development and Integration (both REST and
- Experience in working with build tools like Ant, Maven, SBT, Gradle to build and deploy applications into server.
- Expertise in Object Oriented Analysis and Design (OOAD) and knowledge in Unified Modeling Language (UML).
- Expertise in complete Software Development Life Cycle (SDLC) in Waterfall and Agile, Scrum models.
- Excellent communication skills, interpersonal skills, problem solving skills and very good team player along with a can do attitude and ability to effectively communicate with all levels of the organization such as technical, management and customers.
TECHNICAL SKILLS:
Bigdata Technologies: HDFS, Map Reduce, Pig, Hive, Sqoop, Oozie, Storm, Scala, Spark, Apache Kafka, Flume, Solr, Elastic Search, Ambari, Ab Initio
Database: Oracle 10g/11g, PL/SQL, MySQL, MS SQL Server 2012
SQL Server Tools: Enterprise Manager, SQL Profiler, Query Analyser, SQL Server 2008,SQL Server 2005 Management Studio, DTS, SSIS, SSRS, SSAS
Language: C, C++, Java, Python, Scala
AWS Components: S3, EMR, EC2, Lambda, VPC, Route 53, Cloud Watch
Development Methodologies: Agile, Waterfall
Testing: Junit, Selenium Web Driver
NO-SQL Databases: HBase, Cassandra, MongoDB, Neo4j, Redshift, Redis
ETL Tools: Talend Open Studio, Pentaho, Tableau
IDE Tools: Eclipse, NetBeans, Intellij
Modelling Tools: Rational Rose, StarUML, Visual paradigm for UML
Architecture: Relational DBMS, Client-Server Architecture
Cloud Platforms: AWS Cloud
Operating System: Windows 7/8/10, Vista, UNIX, Linux, Ubuntu, Mac OS X
PROFESSIONAL EXPERIENCE:
Confidential, Herndon, VA
Sr. Hadoop Developer
Responsibilities:
- Experience with Hortonworks distribution.
- Experienced in loading data from different relational databases to HDFS using Sqoop.
- Created Hive internal/external tables with proper static and dynamic partitions and working on them using HQL.
- Deployed data from various sources into HDFS and building reports using Tableau.
- Written several Map reduce Jobs using Java API, also used Jenkins for Continuous integration.
- Collected the logs data from Web Servers and integrated in to HDFS using Flume.
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
- Used the Spark - Cassandra Connector to load data to and from Cassandra.
- Building, managing and scheduling Oozie workflows for end to end job processing.
- Experienced in extending Hive and Pig core functionality by writing custom UDFs using Java.
- Analyzing of large volumes of structured data using SparkSQL.
- Migrated HiveQL queries on structured into SparkSQL to improve performance.
- Created Hive tables, worked on loading data into hive tables and Experienced in loading and transforming heavy sets of structured, semi structured and unstructured data.
- Very Good understanding of Partitions, bucketing concepts in Hive and Designed both managed and External tables in Hive to Optimize Performance.
- Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into Cassandra.
- Used Spark Streaming api for performing transformations and actions on the fly for building common learner data model which gets data from Kafka in real time.
- Has developed custom components and multi-threaded configurations with a flat file by writing JAVA code in Talend.
- Written Spark Streaming jobs in Scala Programming language.
- Extensive experience in creating Lambda functions manually using Java and Python.
- Implemented S3 by creating a bucket and passing the event information to Lambda.
- Experienced in using Automation Scheduling tools like Autosys and Control-M.
- Knowledge on using spring and Rest services and connecting it to the Kubernetes cluster.
- Extensively used Microservices and Postman for hitting the Kubernetes DEV and Hadoop clusters.
- Deployed various Microservices like Spark, MongoDB, Cassandra in Kubernetes and Hadoop clusters using Docker.
- Launched Kinesis Streams services in new AWS regions and granting access to new users through IAM.
- Setting up and worked on Kerberos authentication principals to establish secure network communication on cluster and testing of HDFS, Hive, Pig and MapReduce to access cluster for new users.
- Used Amazon Kinesis as a messaging system for getting all the weblogs, data from rest end points into the Amazon Redshift.
- Used Hortonworks AMBARI for job browser, file browser, running hive and impala queries.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS and store in databases such as HBase.
- Worked on RapidMiner, a Data scientist tool and created several Hadoop operators like Hive Operator, Spark Operator, Mongo Operator.
- Designed ETL process using Talend Tool to load from Sources to Targets through data Transformations.
- Experienced in ETL methodology for performing Data Migration, Data Profiling, Extraction, Transformation and Loading using Talend and designed data conversions from large variety of source systems including Oracle DB2, SQL server, Teradata, Hive and non-relational sources like flat files, XML files.
- Implemented Security Side car and AAF as security authentications for Hadoop cluster.
- Used Python for pattern matching in build logs to format errors and warnings.
- Involved in creating UI using Node.js and called different microservices to setup the frontend.
- Worked on CodeCloud, a git repo for continuous code checkins.
- Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend.
- Developed the Talend jobs and make sure to load the data into HIVE tables & HDFS files and develop the Talend jobs to integrate with Teradata system from HIVE tables.
- Assisted with testing and monitoring Control- M upgrade and active schedule functions, ensuring effective issue resolution and continuous operations.
- Extensive experience in using Microservices, Kubernetes (prod, test, dev) environments and Docker.
- Good experience in developing several seed templates like Scala, MongoDB, Zeppelin and Spark.
- Experienced in configuring the yaml files for Spark, MongoDB, Cassandra and deployed in Docker for connecting to the several Microservices.
Environment: Hadoop, Map Reduce, HDFS, Hive, Cassandra, Sqoop, Oozie, SQL, Kafka, Spark, Scala, Java, AWS, GitHub, Talend Big Data Integration, Impala.
Confidential, Green, OH
Hadoop Developer
Responsibilities:
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in scala.
- Worked on Cloudera distribution and deployed on AWS EC2 Instances.
- Hands on experience on Cloudera Hue to import data on to the graphical User Interface.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in the loading of structured and unstructured data into HDFS.
- Imported metadata from Relational Databases like Oracle, Mysql using Sqoop.
- Implemented Web Interfacing with Hive and stored the data in Hive tables.
- Loaded data from MySQL, a relational database to HDFS on regular basis using SqoopImport/Export.
- Responsible for implementing Map Reduce programs into Spark transformations using Spark and Scala.
- Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to Cassandra.
- Worked on loading CSV/TXT/AVRO/PARQUET files using Scala/Java language in Spark Framework and process the data by creating Spark Data frame and RDD and save the file in parquet format in HDFS to load into fact table using ORC Reader.
- Developed and Scheduled Jobs in Talend Integration Suite.
- Involved in loading data to Kafka Producers from rest endpoints and transferring the data to Kafka Brokers.
- Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system using Scala programming.
- Developed Amazon Kinesis streams for accessing the real-time data from webservers.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Ingested data in mini-batches and performed RDD transformations on those mini-batches of data.
- Good knowledge in setting up batch intervals, split intervals and window intervals in Spark Streaming using Scala Programming language.
- Imported real time weblogs using Kafka as a messaging system and ingested the data to Spark Streaming.
- Implemented data quality checks using Spark Streaming and arranged bad and passable flags on the data.
- Implemented Spark-SQL with various data sources like JSON, Parquet, ORC and Hive.
- Collected XML and JSON data from different Sources and developed Spark APIs that helps to do inserts and updates in Hive tables and made data available in Hive and Impala as per business requirement.
- Created separate branches with in the Talend repository for Development, Production and Deployment.
- Expertise in using Flume in Collecting, aggregating and loading log data from multiple sources into HDFS.
- Involved in Data Querying and Summarization using Pig and Hive and created UDF’s, UDAF’s and UDTF’s.
- Worked on different file formats, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
- Implemented Sqoop jobs for large data exchanges between RDBMS and HBase/Hive/Cassandra clusters.
- Experienced in using Spark Core for joining the data do deliver the reports and for detecting the fraudulent activities.
- Involved in a Huge Data Migration from 80+ MySQL Tables to JSON format using Talend.
- Extensively used Zookeeper as a backup server and job scheduled for Spark Jobs.
- Implemented Hive Partitioning and Bucketing on the collected data in HDFS.
- Experienced with Spark Context, Spark-SQL, Data Frame, Datasets, Spark YARN.
- Knowledge on MLLib (Machine Learning Library) framework for auto suggestions.
- Created executors for every created partition in Kafka Direct Stream as a receiverless approach.
- Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Involved in loading the real-time data to NoSQL database like Cassandra.
- Experienced in using Data StaxSpark Connector which is used to store the data in Cassandra database from Spark.
- Involved in NoSQL (Datastax Cassandra) database design, integration and implementation and written scripts and invoked them using CQLSH.
- Good knowledge in using Data Manipulations, tombstones, Compactions in Cassandra.
- Designed Columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and then exported the transformed data to Cassandra as per the business requirement.
- Experience in working on CQL (Cassandra Query Language), for retrieving the data present in Cassandra cluster by running queries in CQL.
- Involved in maintaining the Big Data servers using Gangila and Nagios.
- Worked on connecting Cassandra database to the Amazon EMR File System for storing the database in S3.
- Deployed the project on Amazon EMR with S3 connectivity for setting a backup storage.
- Implemented usage of Amazon EMR for processing Big Data across aHadoop clusterof virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
- Worked on Production Server's on Amazon Cloud (EC2, EBS, S3, Lambda and Route53).
- Loaded the data into Simple Storage Service (S3) in the AWS Cloud.
- Worked with Parallel connectors for Parallel Processing to improve job performance while working with bulk data sources in Talend.
- Good knowledge in using of Elastic Load Balancer for Autoscalingin EC2 servers.
- Defined Security groups for Amazon EC2 servers in Virtual Private Cloud (VPC).
- Experienced in configuring work flows that involves Hadoop actions using Oozie client.
- Experienced in using ETL tools like Informatica and Talendand involved in transferring the workflows from Informatica to Talend.
- Coordinated with admins and Sr. Technical staff for migrating Terradata to Hadoop and AbInitio to Hadoop as well.
- Worked on Cluster size of 150-200 nodes.
- Experienced with Full Text Search and Faceted Reader search usingSolr.
- Wrote Java code to format XML documents; upload them toSolrserver for indexing.
- Experienced with reporting tools like Tableau to generate the reports.
- Developed Power enter mappings to extract data from various databases, Flat files and load into DataMart using the Informatica.
- Worked with SCRUM team in delivering agreed user stories on time for every sprint.
Environment: Hadoop YARN, Spark-Core, Spark-Streaming, AWS S3, AWS EMR, Spark-SQL, GraphX, Scala, Python, Kafka, Zeppelin, Jenkins, Docker, Microservices, Hive, Pig, Sqoop,Impala, Cassandra, Informatica, Cloudera, Oracle 10g, Linux.
Confidential - San Francisco, CA
Hadoop Developer
Responsibilities:
- Developed MapReduce/ EMR jobs to analyze the data and provide heuristics and reports. We used for improving campaign targeting and efficiency.
- Wrote complex MapReduce jobs in Java to perform operations by extracting, transforming and aggregating to process terabytes of data.
- Responsible for building scalable distributed data solutions usingHadoop.
- Using oozie workflows and enabled email alerts on any failure cases.
- Developed Simple to complex MapReduce Jobs that are implemented using Hive and Pig.
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior and used UDF's to implement business logic inHadoop.
- Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
- Managing and ReviewingHadoopLog Files, deploy and MaintainingHadoopCluster.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL.
- Supporting HBase Architecture Design with theHadoopArchitect group to build up a Database Design in HDFS.
- Scripted complex HiveQL queries on Hive tables to analyze large datasets and wrote complex Hive UDFs to work with sequence files.
- Experience in creating tables, dropping, and altered at run time without blocking updates and queries using HBase.
- Wrote Flume configuration files for importing streaming log data into HBase with Flume.
- Experience in implementing using one or more Azure PaaS services like web sites, SQL Azure Database, Storage, Cloud Services.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Loading the data to HBase using Pig, Hive and Java API’s.
- Incoming messages were handled by using play framework MVC framework.
- Managed and reviewedHadooplog files to identify issues when job fails.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Implemented Frameworks using Java and Python to automate the ingestion flow.
- Worked on tuning the performance on Pig queries.
- Mentored analyst and test team for writing Hive Queries.
- Troubleshooting, manage and review data backups, manage and review Hadoop log files.
Environment: Java J2EE, Hadoop, AWS, Cloudera, Cassandra, HDFS, Flume, Hive, Kafka, Impala, oozie, MapReduce,SQL, Sqoop, LINUX, HBase, Scala, Spark, MapR, Big Data, UNIX Shell Scripting, Strom, Agile.
Confidential - Bethesda, MD
Java/Hadoop Developer
Responsibilities:
- Analyzed Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
- Installed Hadoop, MapReduce, HDFS, and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Imported the web logs using Flume.
- Used Hive to analyze the partitioned and bucketed data to compute various metrics for reporting.
- Coordinated with business customers to gather business requirements, also interacted with other technical peers to derive technical requirements and delivered the BRD and TDD documents.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables loading data and writing queries that will run internally in MapReduce way.
- Created Hive tables, loaded the data and Performed data manipulations using Hive queries in MapReduce Execution Mode.
- Involved in processing ingested raw data using MapReduce, Apache Pig and HBase.
- Worked in developing Pig Scripts for data capture change and delta record processing between newly arrived data and already existing data in HDFS.
- Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
- Involved in loading the created HFiles into HBase for faster access of large customer base without taking Performance hit.
- Created HBase tables to store various data formats of PII data coming from different portfolios.
- Extensively involved in Design phase and delivered Design documents.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Implemented test scripts to support test driven development and continuous integration.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data on to HDFS.
- Developed Hadoop streaming Map/Reduce works using Python.
- Used Reporting tools like Talend to connect with Hive for generating daily reports of data.
- Set up Solr for distributing indexing and search.
Environment: CDH 3.x and 4.x, Hadoop, Hive, MapReduce, Pig, Oozie, Sqoop, Cloudera, HDFS, Solr, Zookeeper, HBase.
Metanoia software Solutions, India Jan ‘12 - May ‘14
Java Developer
Responsibilities:
- Responsible for the analyzing, documenting the requirements, designing and developing the application based on J2EE standards. Strictly Followed Test Driven Development.
- Used Microsoft Visio for designing use cases like Class Diagrams, Sequence Diagrams, and Data Models.
- Extensively developed user interface using HTML, JavaScript, jQuery, AJAX and CSS on the front end.
- Designed Rich Internet Application by implementing jQuery based accordion styles.
- Used JavaScript for the client-side web page validation.
- Used Spring MVC and Dependency Injection for handling presentation and business logic. Integrated Spring DAO for data access using Hibernate.
- Used Ajax and JavaScript to perform client-side validations.
- Used RESTful web services with MVC for parsing and processing XML data.
- Utilized XML and XSL Transformation for dynamic web-content and database connectivity.
- Involved in the implementation of the Design patterns such as Singleton and MVC.
- Experience with SOAP Web Services and WSDL.
- Used ANT for building, creating and deploying the war files and SVN for version control.
- Used Test First Development for development of the project.
- Used Spring ORM module for integration with Hibernate for persistence layer.
- Involved in writing stored procedures, triggers and creating table in Oracle database.
- Performed code refactoring for readability, simplify code structure and improve maintainability.
- Assisted QA team in Test cases preparation, execution and fixing of bugs.
Environment: J2SE 1.5, Servlets, WebLogic, Spring, Hibernate, JDBC, Oracle 9i, SOAP, WSDL, REST, XML, XSLT, Eclipse, HTML, CSS, JavaScript, JSF, ANT, SVN, Log4J, JUnit
Thirdware Solutions Limited, India July’09 - Dec ‘11
Java Developer
Thirdware delivers industry-specific technological expertise through a range of services spanning business applications consulting, design, implementation and support. It is involved in developing Smart Data solutions - yielding clean, organized, actionable data to extract information and insight.
Responsibilities:
- Documented functional and technical requirements, wrote Technical Design Documents.
- Developed analysis level documentation such as Use Case, Business Domain Model, Activity & Sequence and Class Diagrams.
- Developed presentation layer components comprising of JSP, AJAX, Servlets and JavaBeans using the Struts framework.
- Implemented MVC (Model View Controller) architecture.
- Developed XML configuration and data description using Hibernate.
- Implemented the Business logic using Java Spring Transaction Spring AOP.
- Implemented persistence layer using Spring JDBC to store and update data in database.
- Produced web service using WSDL/SOAP standard.
- Implemented J2EE design patterns like Singleton Pattern with Factory Pattern.
- Extensively involved in the creation of the Session Beans and MDB, using EJB 3.0.
- Used Hibernate framework for Persistence layer.
- Extensively involved in writing Stored Procedures for data retrieval and data storage and updates in Oracle database using Hibernate.
- Deployed and built the application using Maven.
- Performed testing using JUnit.
- Used JIRA to track bugs.
- Extensively used Log4j for logging throughout the application.
- Produced a Web service using REST with Jersey implementation for providing customer information.
- Used SVN for source code versioning and code repository.
Environment: Java (JDK1.5), J2EE, Eclipse, JSP, JavaScript, JSTL, Ajax, GWT, Log4j, CSS, XML, Spring, EJB, MDB, Hibernate, Web Logic, REST, Rational Rose, Junit, Maven, JIRA,SVN.
