Hadoop Developer Resume
Hazelwood, MO
SUMMARY
- Overall 7+ years of IT experience which includes strong experience in Big data ecosystem and Java / J2EE related technologies.
- Hands on Experience in Hadoop ecosystem including HDFS, Spark, Hive, Pig, Sqoop, Impala, Oozie, Flume, Kafka, HBase, ZooKeeper, MapReduce
- Extensively understanding of Hadoop Architecture, workload management, schedulers, scalability and various components such as YARN, Map Reduce
- Hands on experience on RDD architecture, implementing Spark operations on RDD and optimizing transformations and in Spark
- Exposure in Spark Streaming, Spark SQL in a production environment
- Worked in building, configuring, monitoring and supporting Cloudera Hadoop CDH5
- Good Knowledge on Hadoop Cluster architecture and monitoring the cluster
- Experience in managing and reviewing Hadoop log files
- Involved in creating Hive tables, loading with data and writing Hive queries that will run internally in map reduce way
- Extensive experience in data ingestion technologies, such as Sqoop, Flume, and Kafka
- Expertise in Java, Scala and scripting languages like Python
- Deep understanding and knowledge of NOSQL databases like MongoDB HBase Cassandra
- Implemented in setting up standards and processes for Hadoop based application design and implementation
- Well versed in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa
- Experience in Object Oriented Analysis Design OOAD and development of software using UML Methodology good knowledge of J2EE design patterns and Core Java design patterns
- Expert in managing Hadoop clusters using Cloudera Manager tool
- Involvement in complete project life cycle design development testing and implementation of Client Server and Web applications
- Experience in Administering Installation configuration troubleshooting Security Backup Performance Monitoring and Fine tuning of Linux Red hat
- Extensive experience working in Oracle DB2 SQL Server and My SQL database Scripting to deploy monitors checks and critical system admin functions automation
- Hands on experience in application development using Java RDBMS and Linux shell scripting
- Experience in Java JSP Servlets EJB WebLogic WebSphere Hibernate Spring JBoss JDBC RMI Java Script Ajax jQuery XML and HTML
- Ability to adapt to evolving technology strong sense of responsibility and accomplishment
- Conversant with Agile methodology standards and Test - Driven Development
TECHNICAL SKILLS
Big Data Skillset: Frameworks & Environments Cloudera CDHs, Hortonworks HDPs, Hadoop1.0, Hadoop2.0, HDFS, MapReduce, Pig, Hive, Impala, HBase, Data Lake, Cassandra, MongoDB, Sqoop, Oozie, Zookeeper, Flume, Apache Spark, Storm, Kafka, YARN, Falcon, Avro
JAVA & J2EE Technologies: Core Java (Java8 & Java FX versions), Hibernate framework, Spring framework, JSP, Servlets, Java Beans, JDBC, Java Sockets & Java Scripts. JavaScript, jQuery, JSF, Prime Faces, XML, Servlets, EJB, JDBC, HTML, XHTML, CSS, SOAP, XSLT and DHTML
Messaging Services: JMS, MQ Series, MDB, J2EE MVC Frameworks Struts … Struts 2.1, Spring 3.2, MVC, Spring Web Flow, AJAX
IDE Tools: IntelliJ, PyCharm, Eclipse
Web services & Technologies: XML, HTML, XHTML, HTML5, AJAX, jQuery, CSS, JavaScript, AngularJS, VB Script, WSDL, SOAP, JDBC, ODBC Architectures REST, MVC architecture
Databases & Application Servers: Oracle, MySQL, DB2, Cassandra, HBase, MongoDB, 8i, 9i, 11i & 10g, MS Access, Teradata, PostgreSQL
Other Tools: Putty, WinSCP, GitLab, GitHub, SVN, CVS
PROFESSIONAL EXPERIENCE
Confidential, San Francisco, CA
Big Data Engineer
Responsibilities:
- As part of project attending daily standups and involving other meetings to gather new requirements.
- As a center of Excellence team, involve in any of the application issues, triage/investigate them, build and fix the issues.
- Used Sqoop tool to pull the data from different databases to store the data into HBase and Hive.
- Gathering the different logs data from various sources using Flume for further process.
- Import millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
- Ownership for control-M jobs and supporting for failing jobs.
- Improving the performance and optimizing existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frames, Pair RDD's & Spark YARN.
- Worked on Elasticsearch to store, analyze, search big volumes of data quickly.
- Extensively worked on python scripts to automate the jobs, which pulls the data from db2, oracle, and Teradata.
- Involved in Scala coding to help the team to resolve any issues which are related to Prediction IO.
- As part of the project involved in setting up different experiments to provide top level recommendations.
- Involved in showcase every week with business stake holders and product mangers to review the changes and adopting new inputs.
- Generating different file formats like JSON, Parquet, CSV, TSV.
- Verifying the Splunk alerts.
Environment: HDP, Spark, HBase, Elasticsearch, Sqoop, Hive, Python, Scala, db2, oracle, Teradata, Flume, Splunk.
Confidential, Hazelwood, MO
Hadoop Developer
Responsibilities:
- Experience with Hadoop Ecosystem components like Hbase, Sqoop, ZooKeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
- Developed PIG and Hive UDF’s in java for extended use of PIG and Hive and wrote Pig Scripts for sorting, joining, filtering and grouping the data.
- Worked with NoSQL databases like Hbase for creating Hbase tables to load large sets of semi structured data coming from various sources
- Elaborated spark programs using Scala, involved in creating Spark SQL Queries and Developed Oozie workflow for spark jobs
- Prepared the Oozie workflows with Sqoop actions to migrate the data from relational databases like Oracle, Teradata to HDFS
- Expand programs in Spark based on the application for faster data processing than standard MapReduce programs.
- Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using HiveQL
- Used Sqoop to store the data into Hbase and Hive
- Enumerated Hive queries to do analysis of the data and to generate the end reports to be used by business users
- Worked on scalable distributed computing systems, software architecture, data structures and algorithms using Hadoop, Apache Spark and Apache Storm etc.
- And ingested streaming data into Hadoop using Spark, Storm Framework and Scala.
- Good experience with NOSQL databases like MongoDB.
- Responsible in creating producer and consumer API's using Kafka
- Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, effective & efficient joins, transformations and other capabilities
- Elaborated Spark code and Spark - SQL / Streaming for faster testing and processing of data.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Developed a data pipeline using Kafka, Hbase, Mesos Spark and Hive to ingest, transform and analyzing customer behavioral data
Environment: Hadoop, HDFS, CDH, Pig, Hive, Oozie, ZooKeeper, Hbase, Spark, Storm, Spark SQL, NoSQL, Scala, Kafka, Mesos, MongoDB
Confidential, Atlanta, GA
Hadoop Developer
Responsibilities:
- Involved in Discussions with business users to gather the required knowledge
- Analyzing the requirements to develop the framework
- Designed and developed architecture for data services ecosystem spanning Relational, NoSQL and Big Data technologies
- Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop / Big Data concepts
- Developed Java Spark streaming scripts to load raw files and corresponding processed metadata files into AWS S3 and Elasticsearch cluster.
- Implemented PySpark logic to transform and process various formats of data like XLSX, XLS, JSON, TXT
- Built scripts to load PySpark processed files into Redshift Db and used diverse PySpark logics
- Developed scripts to monitor and capture state of each file which is being through
- Designed and Developed Real Time Stream Processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning
- Developed Map Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources
- Involved in scheduling Oozie workflow engine to run multiple Hives and pig jobs and used Oozie Operational Services for batch processing and scheduling workflows dynamically
- Included migration of existing applications and development of new applications using AWS cloud services
- Wrought with data investigation, discovery and mapping tools to scan every single data record from many sources
- Implemented Shell script to automate the whole process
- Extracted data from SQL Server to create automated visualization reports and dashboards on Tableau
- Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Managing and reviewing data backups & log files
Environment: AWS S3, Java, Maven, Python, Spark, Kafka, Elasticsearch, Amazon Redshift Db, Shell script, PySpark, Pig, Hive, Oozie, JSON
Confidential, Seattle, WA
Hadoop Developer
Responsibilities:
- Developed simple to complex MapReduce jobs using Java language for processing and validating the data.
- Developed data pipeline using Sqoop, Spark, MapReduce, and Hive to ingest, transform and analyze, customer behavioral data.
- Exported analyzed data to relational databases using Sqoop for visualization to generate reports for the BI team
- Implemented Spark using python and Spark SQL for faster processing of data and algorithms for real time analysis in Spark.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Used the Spark - Cassandra Connector to load data to and from Cassandra. Real time streaming the data using Spark with Kafka.
- Developing Kafka producers and consumers in java and integrating with apache storm and ingesting data into HDFS and Hbase by implementing the rules in storm
- Built a prototype for real time analysis using Spark streaming and Kafka
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume
- Involved in creating Hive tables and working on them using HiveQL and perform data analysis using Hive and Pig.
- Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster to trigger daily, weekly and monthly batch cycles
- Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper
- Expertise in extending Hive and Pig core functionalities by writing custom User Defined Functions (UDF)
- Used IMPALA to pull the data from Hive tables
- Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis
- Create and develop an End to End Data Ingestion on to Hadoop
- Involved in architecture and design of distributed time - series database platform using NOSQL technologies like Hadoop / Hbase, Zookeeper
- Integrated NoSQL database like Hbase with Map Reduce to move bulk amount of data into Hbase
- Efficiently put and fetched data to / from Hbase by writing MapReduce job
Environment: Hadoop, Kafka, Spark, Sqoop, Spark SQL, Spark - Streaming, Hive, Scala, pig, NoSQL, Impala, Oozie, Hbase, Zookeeper
Confidential
Java Developer
Responsibilities:
- Identified System Requirements and Developed System Specifications, responsible for high - level design and development of use cases
- Involved in designing Database Connections using JDBC.
- Organized and participated in meetings with clients and team members.
- Developed web - based Bristow application using J2EE (Spring MVC Framework), POJOs, JSP, JavaScript, HTML, jQuery, Business classes and queries to retrieve data from backend.
- Development of Client - Side Validation techniques using jQuery.
- Worked with Bootstrap to develop responsive web pages.
- Implemented client side and server - side data validations using the JavaScript.
- Responsible for customizing data model for new applications by using Hibernate ORM technology
- Involved in the implementation of DAO and DTO using spring with Hibernate ORM.
- Implemented Hibernate for the ORM layer in transacting with MySQL database.
- Developed authentication and access control services for the application using Spring LDAP.
- Experience in event - driven applications using AJAX, Object Oriented JavaScript, JSON and XML.
- Good knowledge on developing asynchronous applications using jQuery. Valuable experience with Form Validation by Regular Expression, and jQuery Light box.
- Used MySQL for the EIS layer.
- Involved in design and Development of UI using HTML, JavaScript and CSS.
- Designed and developed various data gathering forms using HTML, CSS, JavaScript, JSP and Servlets.
- Developed user interface modules using JSP, Servlets and MVC framework.
- Experience in implementing of J2EE standards, MVC2 architecture using Struts Framework.
- Developed J2EE components on Eclipse IDE.
- Used JDBC to invoke Stored Procedures and used JDBC for database connectivity to SQL.
- Deployed the applications on Tomcat Application Server
- Developed Web services using Restful and JSON.
- Created Java Beans accessed from JSPs to transfer data across tiers.
- Database Modification using SQL, PL / SQL, Stored procedures, triggers, Views in Oracle9i.
Environment: Java, JSP, Servlets, JDBC, Eclipse, Web services, Spring 3.0, Hibernate 3.0, MySQL, JSON, Struts, HTML, JavaScript, CSS.
Confidential
Java Developer
Responsibilities:
- Used Eclipse for writing code for JSP, Servlets.
- Involved in designing the user interface using JSP’s.
- Developed Application using Core Java Concepts.
- Used JDBC to invoke Stored Procedures and database connectivity to ORACLE.
- Used Struts Framework along with JSP, HTML5 to construct the dynamic web pages for the application.
- Participated in feature team meetings and code review meetings.
- Responsible for writing SQL queries using MySQL and oracle 10g.
- Developed various J2EE components like Servlets, JSP, AJAX, SAX, and JMS.
- Used Spring MVC framework to enable the interactions between JSP / View layer and implemented different DPs.
- Utilized JSP, HTML5, CSS3, Bootstrap and Angular JS for front - end development.
- Used JPA and Hibernate annotations for defining object relational metadata.
- Implemented business layer using Core java, Spring Beans using dependency injection, Spring annotations.
- Used a Micro service architecture, with Spring Boot - based services interacting and leveraging AWS to build, test and deploy Identity microservices
Environment: Java, J2EE, JSP, JPA, AJAX, SAX, JMS, HTML5, CSS3, Bootstrap, Angular JS, Java script, Hibernate, Spring MVC, Eclipse, Oracle, SQL, MySQL, Spring Beans