Sr. Big Data Engineer Resume
Seattle, WA
PROFESSIONAL SUMMARY:
- Over 7+ years of professional IT experience with 3+ Years of Big Data Hadoop Ecosystems experience in ingestion, storage, querying, processing and analysis of big data.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, Data Node and MapReduce programming paradigm.
- Proficient in Installation, Configuration and migrating and upgrading of data from Hadoop MapReduce, HIVE, HDFS, HBase, Sqoop, Oozie, Pig, Cloudera, Scala, Zookeeper, Flume, Hortonworks and Cassandra.
- Experience in installation, configuration, supporting and managing - CloudEra's Hadoop platformalong with CDH4&5 clusters.
- Experience in developing a data pipeline through Kafka-Spark API.
- Experience with leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data migration, Oozie for scheduling and HBase as a NoSQL data store.
- Good Exposure on Apache Hadoop Map Reduce programming, PIG Scripting and Distribute Application and HDFS.
- Experience in NoSQL database MongoDB and Cassandra.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experienced in deployment of Hadoop Cluster using Puppet tool.
- Experience in scheduling Cron jobs on EMR, Kafka, and Spark using Clover Server.
- Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing and reviewing Hadoop Log files.
- Proficient in configuring Zookeeper, Cassandra & Flume to the existing Hadoop cluster.
- In depth knowledge of JobTracker, Task Tracker, NameNode, DataNodes and MapReduce concepts.
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Experience in Big Data analysis using PIG and HIVE and understanding of SQOOP and Puppet.
- Good understanding of HDFS Designs, Daemons, federation and HDFS high availability (HA).
- Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data.
- Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
- Good experience in implementing and setting up standards and processes for Hadoop based application design and implementation.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experience using middleware architecture using Sun Java technologies like J2EE, JSP, Servlets, and application servers like Web Sphere and Web logic.
- Familiarity working with popular frameworks likes Struts, Hibernate, Spring MVC and AJAX.
- Experience in Object Oriented language like Java and Core Java.
- Experience in creating web-based applications using JSP and Servlets.
- Experience in managing Hadoop clusters using Cloudera Manager Tool.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Extensive experience working in Oracle, DB2, SQL Server and My SQL database.
- Hands on experience in VPN, Putty, win SCP, VNC viewer, etc.
- Performed data manipulation using Informatica and Oracle software
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Perform responsibilities of databases architecture, testing data quality for load systems, and providing data warehouse solutions
- Ability to adapt to evolving technology, strong sense of responsibility and .
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Oozie, Spark
Languages: Java, SQL, XML, C++, C, WSDL, XHTML, HTML, CSS, Java Script, AJAX, PLSQL.
Java Technologies: Java, J2EE, Hibernate, JDBC, Servlets, JSP, JSTL, JavaBeans, JQuery and EJB.
Frame Works: Struts and Spring.
ETL Tools: Informatica, Pentaho
Design and Modeling: UML and Rational Rose.
Web Services: SOAP, WSDL, UDDI,SDLC
Scripting languages: Java Script, Shell Script
XML technologies: DTD,XSD,XML, XSL, XSLT, SAX, DOM, JAXP
Version Control: CVS, Clear case, SVN
Databases: Oracle 10g/9i/8i, SQL Server,DB2, MS-Access
Environment: s: UNIX, Red Hat Linux, Windows 2000/ server 2008/2007, Windows XP.
PROFESSIONAL EXPERIENCE:
Sr. Big Data Engineer
Confidential, Seattle, WA
Environment: Apache Hadoop, HDFS, Perl, Python, Pig, Hive,Impala, Java, Sqoop, Cloudera CDH5, Oracle, MySQL, Tableau, Talend,ZoomData, Hue, Storm, Data governance implementation.
Responsibilities:
- Understanding and analyzing business requirements, High Level Design and Detailed Design
- Extensive scripting in Perl and Python.
- Design and Develop Parsers for different file formats (CSV, XML, Binary, ASCII, Text, etc.).
- Extensive usage of Cloudera Hadoop distribution.
- Executing parameterized Pig, Hive, impala, and UNIX batches in Production.
- Big Data management in Hive and Impala (Table, Partitioning, ETL, etc.).
- Design and Develop File Based data collections in Perl.
- Extensive Usage of Hue and other Cloudera tools.
- Used Map Reduce JUnit for unit testing.
- Extensive usage of NOSQL (HBASE) Database.
- Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, Cassandra and Hive).
- Design and Develop Dashboards in ZoomData and Write Complex Queries.
- Worked on Shell Programming and CronTab automation.
- Monitored System health and logs and respond accordingly to any warning or failure conditions.
- Extensively worked in Unix and Redhat environment.
- Experience with full development cycle of a Data Warehouse, including requirements gathering, design, implementation, and maintenance.
- Performed testing and bug fixing.
Sr. Hadoop Developer & Big Data Analyst
Confidential, Hartford, CT
Environment: Apache Hadoop, HDFS, Pig Hive, Java, Sqoop, Cloudera CDH5, Oracle, MySQL, Tableau, Talend, Elastic search, Storm, Data governance implementation.
Responsibilities:
- Implemented Spark streaming framework that processes the data for Kafka and perform analytics on top of it.
- Migrating the needed data from Oracle, MySQL in to HDFS in using Sqoop and importing various formats of flat files in to HDFS.
- Proposed an automated system using Shell script to sqoop the job.
- Worked in Agile development approach and Storm, Flume, Bolt, Kafka.
- Developed a data pipeline for data processing using Kafka-Spark API.
- Created the estimates and defined the sprint stages.
- Developed a strategy for Full load and incremental load using Sqoop.
- Mainly worked on Hive queries to categorize data of different claims.
- Integrated the hive warehouse with HBase
- Written customized Hive UDFs in Java where the functionality is too complex.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Handle the tasks of integrating and testing data warehouse for small and large applications
- Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
- Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, Cassandra and Hive).
- Monitored System health and logs and respond accordingly to any warning or failure conditions.
- Presented data and dataflow using Talend for reusability.
Hadoop Developer
Confidential, Houston, TX
Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Java, Cloudera CDH4, Oozie, Oracle, MySQL, Amazon S3.
Responsibilities:
- Acted as a lead resource and build the entire Hadoop platform from scratch.
- Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating with various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.
- Estimated the Software & Hardware requirements for the Namenode and Datanodes in the cluster.
- Extracted the needed data from the server into HDFS and Bulk Loaded the cleaned data into HBase using MapReduce.
- Responsible to migrate from Hadoop to Spark frameworks, in-memory distributed computing for real time fraud detection.
- Responsible for writing programs for testing data as well as develop business enterprise data warehouse application
- Written the Map Reduce programs, Hive UDFs in Java.
- Used Map Reduce JUnit for unit testing.
- Develop HIVE queries for the analysts.
- Executing parameterized Pig, Hive, impala, and UNIX batches in Production.
- Created an e-mail notification service upon completion of job for the particular team which requested for the data.
- Defined job work flows as per their dependencies in Oozie.
- Data cleansing, data quality tracking and process balancing checkpoints.
- Create flexible data model design that is scalable, reusable, while emphasizing performance, Data Validation and business needs.
- Played a key role in productionizing the application after testing by BI analysts.
- Maintain System integrity of all sub-components related to Hadoop.
Sr. Systems Engineer
Confidential
Environment: ATG, JAVA, JSP, Oracle 9i, 10g, Weblogic 10.3.5, SOAP, RESTFul, SVN, SQL Developer, UNIX, Eclipse. XML, HTML, CSS, JavaScript, AJAX, JQUERY, SCA.
Responsibilities:
- Understanding and analyzing business requirements, High Level Design and Detailed Design
- Involved in three releases of versions eShop 2.0.1, eShop 2.1 & eShop 2.2.
- Provided high level systems design; this includes specifying the class diagrams, sequence diagrams and activity diagrams
- Utilized Java/J2EE Design Patterns - MVC at various levels of the application and ATG Frameworks
- Worked extensively on DCS (ATG Commerce Suite) using the commerce API to accomplish the Store Checkout.
- Expertise in developing JSP’s, Servlets and good with web services (REST, SOAP)
- Served as DB Administrator, creating and maintaining all schemas
- Collaborated in design, development and maintenance of the Front-end for applications using JSP, JSTL, Custom Tags
Java Developer
Confidential
Environment: JAVA, JSP 2.0, JavaScript, CSS, HTML, XML, Weblogic Application Server 8.1, Eclipse, Oracle 9i.
Responsibilities:
- Involved in development, testing and maintenance process of the application
- Used Struts framework to implement the MVC architecture
- Created JSP, Form Beans for effective way of implementing Model View Controller architecture
- Created Session Beans, Entity beans for transactions with the database using JDBC
- Developed necessary SQL queries for database transactions
- Developed and maintained the application configuration information in various properties files
- Designed and developed HTML front screens and validated user input using JavaScript
- Used Cascading Style Sheets (CSS) to give a better view to the web pages
- Used Eclipse for code development along with CVS for managing the code
- Performed testing and bug fixing.