Sr. Hadoop Engineer Resume
San Jose, CA
SUMMARY:
- 8+ years of IT experience with gathering and analyzing customer’s technical requirements, development, management, maintenance and production support projects on platforms like Hadoop and Java.
- 3+ years of experience as a Hadoop Developer in all phases of Hadoop and HDFS development with strong work experience in Apache Spark and Data Analytics.
- Which include experience in understanding/knowledge in ingestion, storage, querying, processing and analysis of big data. Hadoop Ecosystem including HDFS, MapReduce, Hive, PIG, YARN, Apache Spark, Flume and Sqoop based Big Data Platforms.
- Efficient in writing MapReduce programs and using Apache Hadoop API for analyzing the structured and unstructured data.
- Experienced in the Hadoop ecosystem components like Hadoop Map Reduce, Cloudera, Hortonworks, HBase, Oozie, Hive, Sqoop, Pig, Flume, and Cassandra.
- Expert in working with Hivedata warehouse tool - creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
- Experience in writing Pig Latin scripts.
- Experience in writing UDFS in java for hive and pig.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- Experience in using Apache Sqoop to import and export data to and from HDFS and Hive.
- Hands on experience in setting up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop jobs. Expertise in modeling and building Hive Meta store and created Hive tables as per requirement where internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Hands on experience in migrating complex MapReduce programs into Apache Spark RDD transformations using Scala
- Experience in handline messaging services using Apache Kafka.
- Good knowledge of No-SQL databases-Cassandra and HBASE.
- Experience in using Hcatalog for Hive, Pig and Hbase.
- Working experience on Pentaho data integration (PDI) Kettle - Extraction, Transformation, and Loading (ETL) and Tableau visualization.
- Experience in working with BI team and transform Big Data requirements into Hadoop centric technologies.
- Hands on experience in writing MapReduce jobs in Java, Scala and Python.
- Strong knowledge in Hadoop cluster installation, capacity planning and performance tuning, benchmarking, disaster recovery plan and application deployment in production cluster.
- Strong knowledge in internals of HDFS and MapReduce framework.
- Experience in developing applications using core Java and web technologies.
- Exposure to Maven/Ant, Git along with Shell Scripting for Build& Deployment Process
- Basic knowledge in application design using Unified Modeling Language (UML), Sequence diagrams, Case diagrams, Entity Relationship Diagrams (ERD) and Data Flow Diagrams (DFD).
- Strong technical and interpersonal skills combined with great commitment towards meeting deadlines. Experience in backend testing using SQL queries.
- Proficient in handling data transfers using batch process and analyzing data flow in the whole system.
- Ability to multitask across multiple concurrent projects.
- Experienced in analyzing the Functional/Technical design documents, and creating Test plans, and Test Cases.
- Excellent knowledge of Software Development Life Cycle (SDLC), through understanding of various phases like requirements, analysis/design, development and testing
- Ability to multitask across multiple concurrent projects
- Well versed with change management process
- Experience in Agile and Waterfall development methodologies
- Hands-on leadership, team management and interpersonal skills
- Experience working in both team and individual environments. Always eager to learn new technologies and implement them in challenging environment.
TECHNICAL SKILLS:
Hadoop Technologies and Distributions: Apache Hadoop, Cloudera Hadoop Distribution CDH3, CDH4, CDH5 and Hortonworks Data Platform (HDP)
Hadoop Ecosystem/BigData technologies: HDFS, Map-Reduce, Hive, Kafka, Pig, Sqoop, Oozie, Flume, Zookeeper, Spark
NoSql Databases: Cassandra
Programming: C, C++, Java,Scala, PL/SQL, ASP .Net
RDBMS: ORACLE, MySQL,SQL Server
Web Development: HTML, JSP, Servlets, JavaScript, CSS, XML
IDE: Rational Rose, Eclipse, NetBeans,Microsoft Visual Studio
Operating Systems: Linux (RedHat, CentOS), Windows XP/7/8
Web Servers: Apache Tomcat
Cluster Management Tools: Cloudera Manager, HortonWorksAmbari
BI Tools: Tableau
PROFESSIONAL EXPERIENCE
Confidential, San Jose, CA
Sr. Hadoop Engineer
Responsibilities:
- Involved in extracting data from various data sources into HDFS. This included data from JIRAP, database and Rest API’s
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the JSON data from Rest APIs
- Expertise in writing PIG and Hive UDF.
- Creating many UDF's with complex types as input and outputs for analysis and aggregations
- Created Hive tables based on requirement and pushed the data into them for analysis
- Developed SQL scripts using Spark for handling different data sets and verifying the performance over Map Reduce jobs.
- Imported the result of Analysis to Mongo DB from where it got fed into Dashboard where Business would see the result
- Used Apache Kafka for importing real time network log data into HDFS
- Interacted with many teams to get the data sources for ingestion into HDFS.
- Used Sort Merge Buckets(SMB) for reducing run time in the hive analysis.
- Created various Documents such as Source-To-Target Data mapping Document, Unit Test,Cases and Data Migration DocumentWorked on Data Serialization formats for converting Complex objects into sequence bits by using A JSON, XML formats.
- Testing and providing the valid test data to users as per requirement.
- Collecting and aggregating large amounts of data in using proper data modelling for hive meta store to save data in HDFS for further analysis.
- Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Troubleshooting, diagnosing, tuning, and solving Hadoop issues.
- Experience in code review for map reduce programs and Hive,PIG scripts.
Confidential, Bellevue, WA
Sr. Hadoop Engineer
Responsibilities:
- Involved in extracting customer’s Big data from various data sources into Hadoop HDFS. This included data from Excel, ERP systems, databases and also log data from servers.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers/sensors.
- Developed a framework with automates the importing data into HDFS and cleanse the data from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Worked hands on with the DevOps team to meet the specific business requirements for individual customers and proposed Hadoop solutions across multiple verticals.
- Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
- Collected user activity data, log data using Kafka for real time analytics
- Installed and configured multi-nodes fully distributed Hadoop cluster with HortonworksAmbari
- The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Written Multiple Pig UDFS to accommodate business requirements.
- Used Sort Merge Buckets(SMB) for reducing run time in the hive analysis
- Used the RegEx, JSON and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed log data and implemented Hive custom UDF’s.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
- Worked with BI teams in generating the reports and designing ETL workflows.
Confidential, Schaumburg, IL
Hadoop Developer
Responsibilities:
- Involved in installing, configuring and managing Hadoop Ecosystem components like Hive, Pig, Sqoop and Flume.
- Migrated the existing data to Hadoop from RDBMS (SQL Server and Oracle) using Sqoop for processing the data.
- Responsible for loading unstructured and semi-structured data into Hadoop cluster coming from different sources using Flume and managing.
- Developed MapReduce programs to cleanse and parse data in HDFS obtained from various data sources and to perform joins on the Map side using distributed cache.
- Used Hive data warehouse tool to analyze the data in HDFS and developed Hive queries.
- Created internal and external tables with properly defined static and dynamic partitions for efficiency.
- Used the RegEx, JSON and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed log data.
- Implemented Hive custom UDF’s to achieve comprehensive data analysis.
- Used Pig to develop ad-hoc queries.
- Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data.
- Implemented daily workflow for extraction, processing and analysis of data with Oozie.
- Responsible for troubleshooting MapReduce jobs by reviewing the log files.
- Used PDI (Pentaho Data Integration) to extract, transform and load (ETL) using metadata driven approach.
Confidential
Jr.Java Developer
Responsibilities:
- Worked with requirement analysis team to gather software requirements for application development.
- Designed UML and entity relational diagrams for the process flow and database design.
- Developed java programs to implement the computational logic for the web applications.
- Designed static web user interface with Html and CSS.
- Developed proto-type test screens in HTML and JavaScript.
- Involved in developing JSP for client data presentation and, data validation on the client side with in the forms.
- Documented application for its functionality and its enhanced features. Created connection through JDBC and used JDBC statements to call stored procedures.
- Developed custom packages to connect to standard data sources and retrieve data efficiently eliminating the need for each team to rewrite the same set of code multiple times.
- Developed JUnit testing framework for Unit level testing.
- Actively involved in code review and bug fixing for improving the performance.
- Worked on product deployment, documentation and support.
- Involved in structuring Wiki and Forums for product documentation
- Involved in R&D, set up and designing Media wiki, PHP and Joomla content management systems.
- Maintained the customer support portal.
