Sr. Hadoop Engineer Resume
Atlanta, GA
SUMMARY
- Over 6+ years of total IT experience including 3 years in Hadoop and BigData technologies.
- Strong experience in using Hadoop eco - system components like HDFS, MapReduce, Oozie, Pig, Hive, Sqoop, Flume, Kafka, Impala, Drill, HBase, Zookeeper.
- Excellent understanding of Classic MapReduce, YARN and their applications in BigData Analytics.
- Experience in working with Spark and Storm.
- Experience in installing, configuring and maintaining the Hadoop Cluster including YARN configuration using Cloudera, Hortonworks.
- Hands on experience in implementing secure authentication for Hadoop Cluster by using Kerberos.
- Experience in benchmarking Hadoop Cluster to tune and obtain the best performance out of it.
- Familiar with all stages of Software Development Life Cycle, Issue Tracking, Version Control and Deployment.
- Extensively worked in writing, tuning and profiling jobs in MapReduce using Java.
- Experience in writing MRUnit to test the correctness of MapReduce programs.
- Expertise in writing Shell-Scripts, Cron Automation and Regular Expressions.
- Hands on experience in dealing with Compression Codecs like Snappy, BZIP2.
- Implemented workflows in Oozie using Sqoop, MapReduce, Hive and other Java and Shell actions.
- Excellent knowledge of Data Flow Lifecycle and implementing transformations and analytic solutions.
- Extending Hive and Pig core functionality by writing Custom UDFs.
- Expertise in creating Custom Serdes in Hive.
- Excellent knowledge in NoSQL databases like HBase, Cassandra and MongoDB.
- Expertise in implementing Data-Mining techniques like social network analysis and sentiment analysis.
- Working knowledge in Data Warehousing with ETL tools like TABLEAU, IBM - DB2 Warehouse Edition.
- Extensively worked on Database Applications using DB2, Oracle, TERADATA, MySQL and PL/SQL.
- Hands on experience in application development using Java, RDBMS.
- Experience as a Java Developer in Web/intranet, Client/Server technologies using Java, J2EE, Servlets, JSP, EJB, and JDBC.
- Expertise in implementing Database projects which includes Analysis, Design, Development, Testing and Implementation of end-to-end IT solutions.
- Experience in End-To-End implementation with Data warehousing team and Strong understanding of Data Warehousing concepts and exposure to Data Modeling, Normalization and Business Process Analysis.
- Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
- Good knowledge of popular frameworks like Struts, Hibernate, and Spring MVC.
- Proven ability to work with senior level business managers and understand the key business drivers that impacts their satisfaction.
- Experience in Agile Engineering practices. Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
TECHNICAL SKILLS:
Big Data Technologies: Hadoop, MapReduce, YARN, Pig, Hive, HBase, Flume, Storm, Kafka, Sqoop, Impala, Oozie, Zookeeper, Spark, Elasticsearch, Kibana, Avro, Parquet.
Hadoop Distributions: Cloudera, HDP, Amazon Web Services.
Java Technologies: Java, J2EE, Java Beans, Struts, Hibernates, JSP, Servlets, EJB, SOA, JDBC, Spring.
NO SQL: Cassandra, Mongo DB, HBase.
DB Languages: MySQL, PL/SQL, Oracle, DB2.
Operating Systems: UNIX, LINUX, MS-DOS, Windows, Mac.
Programming Languages: C, C++, .NET in C#, JSON
Scripting Languages: JavaScript, PHP
Web Technologies: HTML5, XML
Learning Management Systems: D2L
Methodologies: Agile, UML, Design Patterns
Application Servers: Apache Tomcat 5.x 6.0, GlassFish v3.1.2.2
Build Tools: Maven, Ant
Analysis and Reporting Tools: Splunk, Tableau
IDEs: NetBeans, Eclipse, Visual Studio 2010
Testing API: JUnit, MRUnit
Tools: Bzip2, Snappy, Microsoft Office, JIRA, Prezi.
PROFESSIONAL EXPERIENCE
Confidential, Atlanta, GA
Sr. Hadoop Engineer
Responsibilities:
- Involved in creating Hive tables, and loading and analyzing data using hive queries.Responsible for managing data from multiple source.
- Designed, developed and maintained data integration programs in aHadoopand RDBMS environment with both traditional and non-traditional source systems as we as RDBMS and NOSQL data stores for data access and analysis.
- Assisted in exporting analyzed data to relation databases using Sqoop.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Implemented Custom Partitioning in MapReduce.
- Responsible for building scalable distributed data solutions usingHadoop.
- Installed and configuredHadoop, Hive, Pig, Sqoop, Flume and Kafka on theHadoopcluster.
- Developed simple to complex MapReduce jobs.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Hive and Pig programs. Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Hands on Experience in SPARK, used for data transformation for larger data sets.
- Implemented advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
- Worked on AWS for creating instances, load balancers and checking the health status etc.
- Design and coding of efficient, reliable and scalable AWS infrastructure.
- Automation of AWS infrastructure. Better code coverage. Faster build jobs and no-downtime automatic pushes to prod.
- Extensively used Hive/HQL or Hive queries to query for a particular string in Hive tables in HDFS.
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
- Did various performance optimizations like using distributed cache for small datasets, Partition, bucketing in Hive and Map Side joins.
Environment: Hadoop, MapReduce, HDFS, Hive, Pig, HBase,Aws, Sqoop, Kafka, Zookeeper, Flume, MongoDB, Java, Oracle 11g, MySQL, Linux
Confidential, Basking Ridge, NJ
Hadoop Developer
Responsibilities:
- Worked on analyzing, writing Hadoop MapReduce jobs using Java API, Pig and Hive.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in loading data from edge node to HDFS using shell scripting.
- Worked on installing cluster, commissioning & decommissioning of data node, name node high availability, capacity planning, and slots configuration.
- Implemented Partitioning, Dynamic Partitioning, Buckets in Hive.
- Developed PIG scripts using Pig Latin.
- Involved in managing and reviewing Hadoop log files.
- Exported data using Sqoop from HDFS to Teradata on regular basis.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance Pig queries.
- Responsible to manage data coming from different sources.
- Loaded and transformed large sets of structured, semi structured and unstructured data
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Wrote Hive queries and UDF’s.
Environment: Apache Hadoop, Apache Spark, MapReduce, HDFS, Hive, Java, Pig, MongoDB, Linux, XML, MySQL, MySQL Workbench, Java 6, Eclipse, PL/SQL, SQL connector, Sub Version.
Confidential ., Boise, Idaho
Hadoop Engineer
Responsibilities:
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Involved in setting up the Hadoop cluster along with Hadoop Administrator.
- Worked onHadoopcluster which ranged from 10-15 nodes during pre-production stage and sometimes up to 40 nodes during production.
- Installed and configured Hadoop Ecosystem components.
- Imported the data from Oracle source and populated it into HDFS using Sqoop.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Performed real time analysis on the incoming data.
- Configured Kafka to write the data into ElasticSearch via the dedicated consumer.
- Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in OOZIE.
- Performed transformations like event joins, filter bot traffic and some pre-aggregations using Pig.
- Developed MapReduce jobs to convert data files into Parquet file format.
- Included MRUnit to test the correctness of MapReduce programs.
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
- Developed business specific Custom UDF's in Hive, Pig.
- Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.
- Optimized MapReduce code, pig scripts and performance tuning and analysis.
- Implemented a POC with Spark SQL to interpret JSON records.
- Created table definition and made the contents available as a Schema-Backed RDD.
- Implemented advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
- Exported the aggregated data onto Oracle using Sqoop for reporting on the Tableau dashboard.
- Involvement in design, development and testing phases ofSoftware Development Life Cycle.
- Performed Hadoopinstallation, updates, patches and version upgrades when required.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Environment: CDH4, CDH5, Eclipse,Centos Linux, HDFS, MapReduce, Kafka, Storm, Elasticsearch and Kibana, Parquet, Pig, Hive, Sqoop, Spark, Spark-SQL, Oracle, Oozie, Red Hat Linux, Tableau.
Confidential
Java Developer
Responsibilities:
- Responsible for Analysis, Design, Development and Integration of UI components with backend using J2EEtechnologies such as Servlets,JavaBeans, JSP, JDBC.
- Used Spring Framework 3.2.2 for transaction management and Hibernate3 to persist the data into the database.
- Developed JSP's for user interfaces, JSP's usesJavaBeans objects to produce responses.
- Created controller Servlets for handling HTTP requests from JSP pages.
- Writing JavaScript functions for various validation purposes.
- Implemented the presentation layer using Struts2 MVC framework.
- Designed HTML Web pages utilizing JavaScript and CSS.
- Involved in developing distributed, transactional, secure and portable applications based onJava using EJB technology.
- Deployed web applications in web-logic server by creating Data source and uploading jars.
- Created connection pool, Configured deployment descriptor specifying data environment.
- Implemented Multithread concepts inJavaclasses to avoid deadlocking.
- Involved in High Level Design and prepared Logical view of the application.
- Involved in designing and developing of Object Oriented methodologies using UML and created Use Case, Class, Sequence diagrams and also in complete development, testing and maintenance process of the application.
- Created CoreJavaInterfaces and Abstract classes for different functionalities.
Environment: Java /J2EE, CSS, AJAX, XML, JSP, JS, Struts2, Hibernate3, Spring Framework 3.2, Web Services, EJB3, Oracle, J-Unit, Windows XP, Web-logic Application Server, Ant 1.8.2, Ecplise3.x, SOA tool.
Confidential
Software Engineer
Responsibilities:
- Gathering requirements from end users and create functional requirements.
- Used Web Sphere for developing use cases, sequence diagrams and preliminary class diagrams for the system in UML.
- Extensively used Web Sphere Studio Developer for building, testing, and deploying applications.
- Used Spring Framework based on (MVC) Model View Controller, designed GUI screens by using HTML, JSP.
- Developed the presentation layer and GUI framework in HTML, JSP and Client-Side validations.
- Involved in Java code, which generated XML document, which in turn used XSLT to translate the content into HTML to present to GUI.
- Implemented XQuery and XPath for querying and node selection based on the client input XML files to create Java Objects.
- Used Web Sphere to develop the Entity Beans where transaction persistence is required and JDBC was used to connect to the MySQL database.
- Developed the user interface using the JSP pages and DHTML to design the dynamic HTML pages.
- Developed Session Beans on Web Sphere for the transactions in the application.
- Utilized WSAD to create JSP, Servlets, and EJB that pulled information from a DB2 database and sent to a front end GUI for end users.
- In the database end, responsibilities included creation of tables, triggers, stored procedures, sub-queries, joins, integrity constraints and views.
- Worked on MQ Series with J2EE technologies (EJB, Java Mail, JMS, etc.) on Web Sphere server.
Environment: Java, EJB, IBM Web Sphere Application server, Spring, JSP, Servlets, JUnit, JDBC, XML, XSLT, CSS, DOM, HTML, MySQL, JavaScript, Oracle, UML, Clear Case, ANT.