Sr. Hadoop Developer Resume
Chicago, IL
SUMMARY:
- 6years of experience in the Information Technology Industry with strong exposure to software project management, design, development, implementation, maintenance/support and integration of software applications.
- Good knowledge of Hadoop Distributed File System & Ecosystem components like MapReduce, Hive, Pig, HBase, Zookeeper, Flume, Splunk, Sqoop, Storm, Kafka, Oozie & Spark streaming, Kafka, Core Spark API.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Hands on experience in Import/Export of data using Hadoop Data Management tool SQOOP.
- Analyzing Data through Hive QL, Pig Latin & MapReduce programs in Java.Extending HIVE and PIG core functionalities by implementing custom UDF’s.
- Experience working with JAVA J2EE, JDBC, JSP, Java Eclipse, Java Beans, EJB, Servlets. Implementing, partitioning and bucketing in Hive for more efficient querying of data.
- Excellent understanding and knowledge of NOSQL databases like Mongo DB, HBase, and Cassandra.
- Experience with distributed systems, large - scale non-relational data stores, RDBMS, NoSQL map-reduce systems, data modeling, database performance, and multi-terabyte data warehouses.
- Extensively involved in design, development, tuning and maintenance of HBase Casandra databases.
- Support development, testing, and operations teams during new system deployments.Evaluate and propose new tools and technologies to meet the needs of the organization.
- Hands on experience on Apache, Cloudera and Hortonworks Hadoop environments.
- Experienced in using Jenkins &Maven 3.3 to compile the package and deploy to the Application Servers.
- Hands on experience in Import large volume of data from multiple resources in to power BI desktop edited the data before pulling it in to transform and shape the data after it’s imported.
- Expert in understanding the data and designing/Implementing the enterprise platforms like Hadoop Data Lake and Huge Data warehouses.
- Proficient in developing web page quickly and effectively using, HTML 5, CSS3, JavaScript and jQuery and experience in making web page cross browser compatible.
- Proven experiences of using Application Servers like Web Sphere, Tomcat, Web Logic, JSON, Jboss, Tomcat
TECHNICAL SKILLS:
Hadoop Ecosystems: Hadoop ECO Systems; Hadoop MapReduce, HDFS, HBase, Hive, Pig, Sqoop, Zoo Keeper, Kafka, Oozie, NO SQL Mongo DB, Cassandra
Other Technologies: XML,HTML,XHTML,JNDI,HTML5,AJAX,jQuery, CSS, JavaScript, AngularJS, VB Script, WSDL, SOAP, JDBC, ODBC Architectures REST, MVC architecture.
Programming Languages: Java JDK1.4 1.5 1.6 (JDK 5 JDK 6), C/C++, SQL, Teradata SQL, PL/SQL, Servlets, JavaBeans, JDBC, JNDI, JTA, JPA
Database Servers: MS SQL Server, MY SQL, Oracle 9i/10g, MS access, Teradata TeradataV2R5
Operating Systems: Windows, Server Windows XP/Vista, Mac OS, UNIX, LINUX
Methodologies: Agile, Scrum, MVC, SDLC
PROFESSIONAL EXPERIENCE:
Sr. Hadoop developer
Confidential, Chicago, IL
Responsibilities:
- Implemented several scheduled Spark, Hive & Map Reduce jobs in Hadoop MapR distribution.
- Deployed several process oriented scheduled jobs through cron tabs and event engines using wrapper scripts for invoking the Spark module.
- Developed various main & service classes through Scala using spark SQLs for the requirement specific tasks.
- Leveraging LambdaArchitecture using Akka framework &Kafka Connectors using Hbase schema designs.
- Great familiarity with Hive joins& used HQL for querying the databases eventually leading to complex Hive UDFs.
- Expertise knowledge of handling UNIX environment like changing the permissions of the files and groups. Great ability to work through the command line interface.
- Involved in building the runnable jars for the module framework through Maven clean & Maven dependencies.
- Involved in Data Validation and fixing discrepancies by working in coordination with the Data Integration and Infra Teams.
- Productive implementation of DStreams on resilient distributed dataset (RDD) through various windows also simultaneously update log files for the streams.
- Extensive experience in Spark Streaming (version 1.5.2) through core Spark API running Scala, Java to transform raw data from several data sources into forming baseline data.
- Driving POC initiatives for finding the feasibilities of different traditional and Big data reporting tools with the data lake.
- Hands on expertise in running the SPARK & SPARK SQL.Implemented SPARK batch jobs.
- Developing the Tasks and setting up the requirement environment for running Hadoop in cloud on various instances.
- Developed Hive (version 0.11.0.2) and Impala (2.1.0 & 1.3.1) for end user / analyst requirements to perform ad hoc analysis
- Performed Unit Testing & Integration with sample test cases and assisted QA Team and addressed several performance issues according to the Business Unit requirements.
- Involved in data integration from HDFS to Power BI to perform data analytics and visualization.
- Proven expertise in handling the exception scenarios while handling the errored feed data in coordination with the Data Architects, Data Integration team, Business Partners and Stakeholders.
- Indulged in regular stand-up meetings, status calls, Business owner meetings with stake holders, Risk management Teams in an Agile Environment.
- Great understanding of the high-level architecture of the business logic for decomposing the complexity of module to simple achievable tasks for efficient development.
- Ideal approachability towards handling issues raised against the team by being a great team player as well as shouldering the responsibility as an individual in an appropriate manner whenever it matters.
- Extensive knowledge in NoSQL databases like HBase. Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
- Responsible for continuous monitoring and managing Elastic MapReduce (EMR) cluster through AWS console.
Environment: MapRHadoop Distribution, Hive, Scala, HBase, Sqoop, Maven builds, Spark, Spark SQL, Oozie, Linux/Unix, SVN, Talend, Lambda, Akka, Kafka, Power BI.
Hadoop Developer
MSRB, Washington, DC
Responsibilities:
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
- Designing and implementing semi-structured data analytics platform leveraging Hadoop.
Worked on performance analysis and improvements for Hive and Pig scripts at MapReduce job tuning level.
- Installation and Configuration of Hadoop Cluster. Working with Cloudera Support Team to Fine Tune Cluster. Developed a custom File System plugin for Hadoop so it can access files on Hitachi Data Platform.
- Developed connectors for elastic search and green plum for data transfer from a kafka topic.
- Performed Data Ingestion from multiple internal clients using Apache Kafka Developed k-streams using java for real time data processing.Involved in Optimization of Hive Queries.
- Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
- Involved in Data Ingestion to HDFS from various data sources.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Extensively used Apache Sqoop for efficiently transferring bulk data between Apache Hadoop and relational databases.
- Automated Sqoop, hive and pig jobs using Oozie scheduling.Extensive knowledge in NoSQL databases like HBase
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
- Responsible for continuous monitoring and managing Elastic MapReduce (EMR) cluster through AWS console.
- Have good knowledge on writing and using the user defined functions in HIVE, PIG and MapReduce.
- Helped business team by installing and configuring Hadoop ecosystem components along with Hadoop admin.
- Developed multiple Kafka Producers and Consumers from scratch as per the business requirements.
- Worked on loading log data into HDFS through Flume
- Created and maintained technical documentation for executing Hive queries and Pig Scripts.
- Worked on debugging and performance tuning of Hive &Pig jobs.
- Used Oozie to schedule various jobs on Hadoop cluster.Used Hive to analyses the partitioned and bucketed data.
Environment: Hortonworks 2.4, Hadoop, HDFS, Map Reduce, Mongo DB, Cloudera Java, VMware, HIVE, Eclipse, PIG, Hive, HBase, AWS, Tableau, Sqoop, Flume, Linux, UNIX
Hadoop Developer
Confidential, San Francisco, CA
Roles & Responsibilities:
- Worked on a live 60 nodes Hadoop cluster running CDH5.4.4, CHD5.2.0, CDH5.2.1
- Involved in migrating the map reduce jobs into Spark Jobs and Used Spark SQL and Data frames API to load structured and semi structured data into Spark Clusters.
- Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka.
- Worked with highly unstructured & semi structured data of 90 TB (270 TB with replication factor of 3)
- Productive implementation of DStreams on resilient distributed dataset ( RDD ) through various windows also simultaneously update log files for the streams.
- Extensive experience in Spark Streaming (version 1.5.2) through core Spark API running Python, Scala, Java & Python Scripts to transform raw data from several data sources into forming baseline data.
- Hands on expertise in running the SPARK & SPARK SQL.
- Implemented SPARK batch jobs.
- Implemented various MapReduce Jobs in custom environments and updating them to Hbase tables by generating hive queries.
- Performed Sqooping for various file transfers through the Hbase tables for processing of data to several NoSQLDBs.
- Involved in developing Hive UDFs and reused in some other requirements. Worked on performing Join operations.
- Involved in creating partitioning on external tables. Good hands on experience in writing HQL statements as per the user requirements.
- Implemented Cassandra connector for Spark in Java.Implemented Cassandra connection with the Resilient Distributed Datasets (local and cloud)
- Data visualization and reporting was performed through Tableau & Talend.
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Developed UDFs in Scala, Java & Python as and when necessary to use in PIG and HIVE queries
- Developed Oozie workflow for scheduling and orchestrating the ETL process
Environment: Cloudera CHD4, Teradata, Amazon Web Services HBase, CDH5, Scala, Python Java Scripts Hive, Sqoop, Splunk, Storm, Spark, Flume,AVRO, Oozie, CentOS, Ambari, Oracle, SVN, Kafka, Data Lake, Github Java 1.7.x, JIRA, Talend.
Java Developer
Confidential
Roles & Responsibilities:
- Involved in design, development and analysis documents in sharing with Clients.
- Analysis and Design of the Object models using JAVA/J2EE Design Patterns in various tiers of the application.
- Worked with Restful Web Services and WSDL. Worked with Jenkins, Maven build tool to build the Project.
- Involved in Coding JavaScript code for UI validation and worked on Struts validation frameworks.
- Worked on implementing directives and scope values using AngularJS, JSON for an existing webpage.
- Familiar with the state-of-the-art standards, processes, design processes used in creating and designing optimal UI using Web 2.0 technologies like Angular JS, Node JS, Ajax, JavaScript, CSS, and XSLT.
- Involved in the Preparation of Program Specification and Unit Test Case Document.
- Involved in mapping of all configuration files according to the JSF Framework
- Written SQL, PL/SQL and stored procedures as part of database interaction.
- Testing and production support of core java based multithreading ETL tool for distributed loading XML data into Oracle11g database using JPA/Hibernate.
- Used Hibernate framework and Spring JDBC framework modules for backend communication in the extended application.
- Developed Presentation Layer using HTML, CSS, and JSP and validated the data using AJAX and Ext JS and JavaScript.
- Involved in the development of Database Connections and Database Operations using JDBC. Involved in write SQL Queries and Stored Procedures.
- Defined and Developed Action and Model Classes.
- Wrote Action Form and Action classes and used various HTML tags, Bean, and Logic etc., also configured Struts-Config.xml for global forwards, error forwards & action forwards.
- Developed UI using JSP, JSON and Servlet and server-side code with Java.
- Used Java Mail (JMS) API to send Email Notifications for the users.
- Worked on database design and implementation (Oracle). Prepared checklist and guidelines documentation.
- Developed Maven build scripts using Jenkins and involved in deploying the application on WebSphere.
- Created WSDLs as per wire frames, UI pages & generated client jars using JAX-WS. Used Apache CXF to create SOAP based &Restful web services.
- Used SVN as version control repository.
Environment: Java/J2EE, JSP, JSON Servlets, EJB, XML, XSLT, Struts, Rational Rose, Apache Struts Framework, Web Services, DB2, Beyond Compare, Angular JS, Node JS, GitHub, Web Services, CVS, IBM WebSphere Studio Enterprise Developer, JUnit, Log4j, Windows XP, Red Hat LINUX.
