Senior Hadoop Developer Resume
San Jose, CA
SUMMARY:
- 8 years of experience in Design, Development and Implementations of robust technology systems, with specialized expertise as Hadoop Developer. Able to understand business and technical requirements quickly; Excellent communications skills and work ethics; Able to work independently.
- 4 years of experience as Hadoop Developer with strong demand on Hadoop framework, HDFS and parallel processing architecture.
- Experience with complete Software Design Lifecycle including design, development, testing and implementation of moderate to advanced complex systems.
- Experience in serving as an NIFI Admin to support production jobs.
- Experience in setting up a NIFI Cluster of 2 nodes to increase the performance (due to additional processing resources) and also to achieve High Availability.
- Hands on experience on Big Data tool like Teradata.
- Involved in developing NIFI Custom Processors to ingest data into Teradata.
- Experience in using TDCH connection technique to ingest data from Teradata to Hive.
- Experience in developing Spark code Python and Spark - SQL for faster processing and testing of data and exploring of optimizing it using Spark practices.
- Hands on experience in analyzing streaming data by Apache Spark and Storm systems in place over a data store using HBase.
- Used Spark API over Hortonworks Hadoop YARN to perform analytics in Hive.
- Experience in importing data from different sources like Hdfs /HBase into Spark RDD.
- Hands on experience in Partitioning the data using Hive.
- Experience in writing custom UDF’s and UDAF’s to achieve the desired functionality which is not possible by core functionality of HIVE and PIG.
- Hands on experience on JAVA, PYTHON, SCALA
- Experience in writing optimized Map reduce programs for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
- Hands on experience in importing and exporting data from/to databases like MYSQL, Teradata into/from HDFS using Sqoop.
- Experience in writing SQL queries and passing to Parsing Engine for execution plan in Teradata.
- Experience in using different number of AMP’s on each data node to store and converting the data.
- Strong experience in working on Storm to import and export batch processing data.
- Experience in exporting and importing streaming data and logs using Flume.
- Experience in using Kafka Streams API for building applications like performing stateful computations.
- Experience in using Kafka streaming as a platform for storing and processing historical data from the past.
- Hands on experience in Spark streaming to receive the input data streams from various sources like Kafka and Flume
- Hands on experience in using distributed file system like HDFS to store static files for batch processing.
- Hands on experience in analytic MPP database like Apache Impala to analyze the instant insights from data.
- Hands on experience in using HBase coprocessors for putting the business computation code in the coprocessor and returning the result to the client.
- Experience in working on HBase versioning 1, 2 and 3.
- Hands on experience in working on Apache Cassandra command line tools like Nodetool and CQL shell.
- Experience in integrating Cassandra DB with Hive to get data in Cassandra DB to avoid data lose and to avoid network bottlenecks.
- Hands on experience in working on Amazon cloud based storage products like Amazon EC2.
- Hands on experience in working on Talend open studio for Big data to translate an ETL job to a MapReduce job.
- Experience in working on SSIS component to perform a broad range of data migration tasks.
- Experience in preparing reports using data visualization tools like Tableau.
- Experience in Oozie workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph of action with control flows.
- Experience in development tools like Maven, Artifactory.
- Experience in working on micro-service architecture to process huge volume of data
- Experience in developing Sqoop scripts, Pig scripts and Hive queries using Oozie workflows and sub-workflows.
- Experience in job scheduling tools like AutoSys.
- Experience in developing data pipeline using Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Experience in working on Agile and Waterfall software development methodologies.
- Experience in working on TDD software development methodology.
TECHNICAL SKILLS:
Hadoop Framework: Hdfs, Map Reduce, Pig, Hive, Apache NIFI, Hbase, Sqoop, Zookeeper, Oozie, Storm, Kafka, Spark, Apache Impala, Flume.
NoSQL Databases: Hbase, Apache Cassandra
Programming Language: Java, Python, Scala.
Microsoft: MS Office, MS Project, MS Visio, MS Visual Studio 2003/ 2005/ 2008
Databases: MySQL, Oracle 8i/9i/10g, SQL Server, PL/SQL Developer.
Operating Systems: Linux, Cent OS, RHEL, Windows2000/2003/2008/XP/VistaScripting: Shell Scripting, HTML Scripting, puppet
Programming: C, C++, Core Java, PL/SQL.
WEB Servers: Apache Tomcat, JBOSS and Apache Http web server Cluster
Management Tools: HDP Ambari, Cloudera Manager.
IDE: Net Beans, Eclipse, Microsoft SQL Server, MS Office
PROFESSIONAL EXPERIENCE:
Senior Hadoop Developer
Confidential
Responsibilities:
- Currently working as developer on Hortonworks distribution.
- Responsible for Data Ingestion, Data Cleansing, Data Standardization and Data Transformation.
- Served as an NIFI Admin to support the production jobs.
- Also involved in setting up a NIFI cluster of two nodes to increase performance (due to additional processing resources) and also to achieve High Availability.
- Having deep knowledge of NIFI repositories and functionality of Content Repository, Flowfile Repository, Provenance Repository and Database Repository.
- Developed automated installation scripts for NIFI Installation and Upgrade.
- Hands on experience in using the custom processors which was developed by our team.
- Hands on experience on Big Data tool like Teradata.
- Involved in Data Ingestion from source(SQL Server) to Teradata and from Teradata to Hadoop (Hive)using TDCH connection.
- Hands on experience on Teradata SQL language and also on Stored procedures
- Also having experience in using NIFI as a ingestion tool to ingest data from source to Teradata and Teradata to Hadoop using Custom processors.
- Configured Sparkstreaming to get ongoing information from the Kafka and store the stream information to HDFS.
- Used various Spark Transformations and Actions for cleaning the input data.
- Used Kafka for log accumulation like gathering physical log documents off servers and places them in a focal spot like HDFS for handling.
- Implemented SparkScripts using Scala, SparkSQL to access hive tables into Sparkfor faster processing of data.
- Extract Real time feed using Kafka and SparkStreaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
- Used Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
- Involved in Monitoring set up for Hadoop production jobs using ELK.
- Involved in Monitoring set up for NIFI production jobs.
- Worked on Data Visualization tools like Tableau to visualize the data.
- Day to day responsibilities includes developing the code based on the LLD(low level design) which is given by the design team and unit testing it.
- Worked on TDD software development methodology.
- Attending daily Scrum calls to report the status of every day work.
- Collaborating with Scrum Master and developing team to achieve the given task.
- Attending the sprint demo in which our team usually give demo to the client at the end of every sprint.
- Attending the Retrospective calls after every sprint demo and discussing the pros and cons of the result so far.
Environment: Hdfs, Pig, Hive, Teradata, NIFI, Tableau
Senior Hadoop Developer
Confidential,
Responsibilities:
- Currently working as developer on Hortonworks distribution.
- Responsible for Data Ingestion, Data Cleansing, Data Standardization and Data Transformation.
- Served as an SME on Master Data Management and Data governance.
- Worked on micro-service architecture to process huge volumes of data.
- Worked on various ingestion tools like Sqoop, Flume for ingesting the data from RDBMS and streaming data.
- Worked on Storm for ingesting the batch processing data.
- Worked on Spark DStreams to represent the stream of input data received from different streaming sources like Flume and Kafka.
- Worked on StreamingContext API to convert the file system data to batches of data.
- Worked on Spark Engine to generate final stream of results in batches.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Worked on transformations using SparkSQL.
- Developed Spark Applications using PySpark API.
- Involved in integrating HBase with spark to get data in HBase.
- Involved in creating tables with column families using the data which is stored in HBase.
- Worked on versioning 3 in HBase to save the last three updated values including the present value in the table.
- Involved in integrating MPP database like Impala with Spark to get data into Impala for instant insights of data
- Developed Python source code in Spark to apply transformations for faster data processing.
- Involved in creating tables on processed data using HiveQL.
- Developed customized UDF’s and UDAF’s I Hive and Pig to achieve the desired functionality.
- Worked on Talend Open Studio for Big data for translating ETL job to MapReduce job.
- Involved in in writing SQL queries and passing to Parsing Engine for execution plan in Teradata.
- Worked on different number of AMP’s on each data node to store and converting the data.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, JSON and CSV formats.
- Worked on workflow schedulers like Azkaban and Hamake to manage the Hadoop jobs.
- Worked on JIRA software to represent the Product Backlog and Sprint Backlog.
- Worked on Data Visualization tools like Tableau to visualize the data.
- Day to day responsibilities includes developing the code based on the LLD(low level design) which is given by the design team and unit testing it.
- Worked on TDD software development methodology.
- Attending daily Scrum calls to report the status of every day work.
- Collaborating with Scrum Master and developing team to achieve the given task.
- Attending the sprint demo in which our team usually give demo to the client at the end of every sprint.
- Attending the Retrospective calls after every sprint demo and discussing the pros and cons of the result so far.
Environment: Hdfs, Pig, Hive, PySpark API, SparkSQL, Spark streaming, Sqoop, Flume, Kafka, HBase, Impala, Talend Open Studio for Big data, Azkaban, Hamake, Tableau, Hortonworks
Senior Hadoop Developer
Confidential, San Jose, CA
Responsibilities
- Worked as a developer on Cloudera distribution.
- Responsible for Data Ingestion, Data Transformation, Data Standardization and Data Cleansing.
- Worked on ingestion tools like Sqoop to import and export data from/to databases like MYSQL, Teradata into/from HDFS.
- Worked on Partitioning the data using Hive .
- Developed customized UDF’s in Pig and Hive to achieve the desired functionality.
- Worked on MR2 Map Reduce programs to process the data on HDFS.
- Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Involved in supporting Design and analysis by providing POC’s by using Cassandra DB.
- Involved in integrating Cassandra DB with Hive to get data in Cassandra DB.
- Closely associated with Cassandra DB in implementing Cassandra data model in application environment to ensure solution is not effecting existing business as usual.
- Worked on Apache Cassandra command line tools like Nodetool and CQL shell to execute the commands.
- Performed analysis using MPP database like Apache Impala to analyze instant insights from the data.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig. Developed Pig scripts to pull data from HDFS.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, JSON and CSV formats.
- Worked on JIRA software to represent the Product Backlog and Sprint Backlog.
- Worked on Data Visualization tools like Tableau to visualize the data.
- Day to day responsibilities includes developing the code based on the LLD(low level design) which is given by the design team and unit testing it.
- Attending daily Scrum calls to report the status of every day work.
- Collaborating with Scrum Master and developing team to achieve the given task.
- Attending the sprint demo in which our team usually give demo to the client at the end of every sprint.
- Attending the Retrospective calls after every sprint demo and discussing the pros and cons of the result so far.
Environment: Hdfs, Mapreduce, Hive, pig, Sqoop, Apache Cassandra, Apache Impala, Teradata, Cloudera distribution, Oozie.
Hadoop Developer
Confidential, Carrollton, TX
Responsibilites:
- Responsible for data Ingestion, data transformation, data standardization and data cleansing.
- Interacting with BI team and design team for preparing the low level design and high level design documents.
- Day to day responsibilities includes developing the code based on the LLD(low level design) which is given by the design team and unit testing it.
- Involved in identifying possible ways to improve the efficiency of the system.
- Developed multiple MapReduce jobs in Java for log data cleaning and preprocessing and scheduled the job to collect aggregate the log on an hourly basis.
- Logical implementation and interaction with HBase.
- Efficiently put and fetched data to/from HBase by writing MapReduce job program.
- Developed MapReduce jobs to automate transfer of data from/to HBase.
- Assisted with the addition Hadoop processing to the IT infrastructure.
- Used Flume to collect all the web log from the online ad-servers and push into HDFS.
- Implemented MapReduce job and execute the MapReduce job to process the log data from the ad-servers.
- Wrote efficient MapReduce code to aggregate the log data from the Ad-server.
- Worked on Partitioning the data using Hive.
- Used Hive to analyze the Partitioned data and compute various metrics for reporting.
- Developed customized UDF’s in Hive to implement the business logic.
- Worked on Oozie Workflow scheduler to schedule Hive and MapReduce jobs by Direct Acyclic Graph.
- Have deep and thorough understanding of ETL tools like Informatica and Talend and how the data can be migrated to Big data environment.
Environment: MapReduce, Hdfs, Pig, Hive, HBase, Oozie, Informatica, Talend, Tableau.
Data Warehouse Developer
Confidential
Responsibilities:
- Involved in design & development of operational data source and data marts in Oracle.
- Involved in conceptual, logical and physical data modelling and used star schema in designing the data warehouse.
- Designed ETL process using Informatica designer to load the data from various source databases and flat files to target data warehouse in Oracle.
- Used Power mart Workflow Manager to design sessions, event wait/raise, and assignment, email, and command to execute mappings.
- Created parameter based mappings, Router and lookup transformations.
- Created Mapplets to reuse the transformation in several mappings.
- Optimized mappings using transformation features like Aggregator, Filter, Joiner, Expression, Lookups.
- Worked enormously on Oracle database 9i and flat files as data sources.
- Performed integrated testing for various mappings. Tested the data and data integrity among various sources and targets.
- Created daily and weekly workflows and scheduled to achieve business needs.
- Created Daily/weekly ETL process which maintains 200GB of data in target database.
- Followed the Waterfall software development methodology.
Environment: Oracle 9i, Informatica Power center, JSP, JavaScript
PL/SQL Developer
Confidential
Responsibilities:
- Collaborated with team in Analysis, Design and Develop database using ER diagram, involved in Design, Development and Testing of the system.
- Developed SQL Server Stored process, Tuned SQL Queries(using indexes)
- Worked on SSIS component to perform broad range of data migration tasks.
- Created views to facilitate easy user interface implementation and Triggers on them to facilitate consistent data entry into the database.
- Generated customized staging tables to handle import data and functions for duty calculations and validation of the inputs
- Extensive use of Visual Source Safe for version controlling and source code management.
- Inculcated in building database objects tables, indexes, sequences and constraints as per the organization requirement.
- Constructed shell scripts to load the flat files into the database using SQL Loader.
- Customized forms to capture customer related data and developed customer information report, payment performance report sales order report, rebate calculation report, etc., using Forms/Reports11g.
- Worked on the ETL side to load data into database from different servers.
- Responsible to design the mappings, logic, sessions, workflows and worklets.
Environment: Linux 5/4, Sun Solaris 10/9/, Oracle 10g, SUN Servers, SUN Fires, Linux, HP open view service desk (OVSD), Kickstart, Jump Start, Fujitsu Prime power servers, Samba, AutoSys, SSIS, VERITAS Volume Manager (VVM), LDAP, EMC Storage SAN, VERITAS Cluster Server (VCS), VxVM, VMware servers, WebLogic, Jboss and Apache.
