Sr. Hadoop Developer Resume
Farmington, CT
PROFESSIONAL SUMMARY:
- 7 Years of IT industry experience with 5 years of experience in dealing with Apache Hadoop components like HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Oozie, Zookeeper, HBase, Cassandra, MongoDB and Amazon Web Services.
- 2 years of experience in the Application Development and Maintenance of SDLC projects using Java technologies.
- Developed applications for Distributed Environment using Hadoop, MapReduce and Python.
- Developed MapReduce jobs to automate transfer of data from HBase.
- Developing and Maintenance the Web Applications using the Web Server Tomcat.
- Experience in integrating Hadoop with Ganglia and have good understanding of Hadoop metrics and visualization using Ganglia.
- Good experience working with Hortonworks Distribution, Cloudera Distribution and MapR Distribution.
- Very good understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondary Namenode, and MapReduce concepts.
- Experience in data extraction and transformation using MapReduce jobs.
- Proficient in working with Hadoop, HDFS, writing PIG scripts and Sqoop scripts.
- Performed data analysis using Hive and Pig.
- Expert in creating Pig and Hive UDFs using Java in order to analyze the data efficiently.
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice - versa.
- Strong understanding of NoSql databases like HBase, MongoDB & Cassandra.
- Strong understanding of Spark real time streaming and SparkSQL and experience in loading data from external data sources like MySQL and Cassandra for Spark applications.
- Experience in performing in-memory data processing and real time streaming analytics using Apache Spark with Scala, Java and Python.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Intensive working experience with Amazon Web Services(AWS) using S3 for storage, EC2 for computing and RDS, EBS.
- Well versed with job workflow scheduling and monitoring tools like Oozie.
- Loaded streaming log data from various web servers into HDFS using Flume.
- Experience in using Sqoop, Oozie and Cloudera Manager.
- Experience on Source control repositories like SVN, CVS and GIT.
- Experience in improving the search focus and quality in ElasticSearch by using aggregations and Python scripts.
- Hands on experience in application development using RDBMS, and Linux shell scripting.
- Have experience with working on Amazon EMR and EC2 Spot instances.
- Solid understanding of relational database concepts.
- Extensively worked with Unified Modeling Tools (UML) in designing Use Cases, Activity flow diagram, Class diagrams, Sequence and Object Diagrams using Rational Rose, MS-Visio.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
- Adequate knowledge and working experience in Agile & Waterfall methodologies.
- Support development, testing, and operations teams during new system deployments.
- Practical knowledge on implementing Kafka with third-party systems, such as Spark and Hadoop.
- Good team player and can work efficiently in multiple team environments and multiple products. Easily adaptable to the new systems and environments.
- Possess excellent communication and analytical skills along with a can - do attitude.
TECHNICAL SKILLS:
Programming languages: C, Java, Python, Scala, SQL
HADOOP/BIG DATA: MapReduce, Spark, SparkSQL, PySpark, SparkR, Pig, Hive, Sqoop, HBaseFlume, Kafka Cassandra, Yarn, Oozie, Zookeeper, ElasticSearch
Databases: MySQL, PL/SQL, Mongo DB, HBase, Cassandra.
Operating Systems: Windows, Unix, Linux, Ubuntu.
Web Development: HTML, JSP, JavaScript, JQuery, CSS, XML, AJAX.
Web/Application Servers: Apache Tomcat, Sun Java Application Server
Tools: IntelliJ, Eclipse, Net Beans, Nagios, Ganglia, Maven
Scripting: BASH, JavaScript
Version Controls: GIT, SVN
PROFESSIONAL EXPERIENCE:
Sr. Hadoop Developer
Confidential, Farmington, CT
Responsibilities:
- Involved in installation, configuration and maintenance of Hadoop clusters for application development with Cloudera distribution.
- Developed Kafka consumer’s API in Scala for consuming data from Kafka topics.
- Developed end-to-end scalable distributed data pipelines which receiving data using distributed messaging systems Kafka through persistence of data into HDFS with Apache Spark using Scala.
- Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment.
- In the framework we just need to mention the table names, schemas and location of source file/ Sqoop parameters etc. and the framework will generate the entire code which includes Workdlow.xml.
- Performed advanced operations like text analytics and processing, using in-memory computing capabilities of Spark using Scala.
- Experience in query data using Spark SQL on Spark to implement Spark RDD’S in Scala.
- Experienced in working with different scripting technologies like Python, UNIX shell scripts.
- Performed POC on writing the spark applications in Scala, Python and R programming language.
- Worked on Partitioning, Bucketing, Parallel execution, Map side Joins for optimization of necessary hive queries.
- Performed Hive QL to create Hive tables and to write Hive queries to perform the data analysis.
- Experience in collecting log data from web servers and pushed to HDFS using Flume and NoSql database Cassandra.
- Used Oozie workflow to Manage and scheduling Jobs on a Hadoop Cluster and used Zookeeper for cluster coordination services.
- Used NIFI for the transformation of data from different components of Big data ecosystem.
- Worked on different data sources like Oracle, Netezza, MySQL, Flat files etc. and experience with AWS components like Amazon Ec2 instances, S3 buckets and Cloud Formation templates.
- Used Qlik sense to build customized interactive reports, worksheets, and dashboards.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Strong expertise on MapReduce programming model with XML, JSON, CSV file formats.
- Involved in managing and organizing developers with regular code review sessions by utilizing Agile and Scrum Methodologies.
- Experience in implementing Spark RDD transformations, actions, data frames, case classes to required data by using Spark core.
- Implemented Jira for bug tracking and Bit-bucket to code and code review.
- Implemented apache airflow DAG to find popular items in Redshift and ingest in the main PostgreSQL via a web service call.
- Implemented Spark applications in data processing project to handle data from various sources and creating DStreams, Data frames on input data which we get from streaming service like Kafka.
Environment: Map Reduce, HDFS, Hive, Spark, Spark-SQL, Sqoop, Apache Kafka, Java 7, Cassandra, Scala, Apache Pig 0.14.0, Apache Hive 1.0.0,Oozie, Linux, AWS EC2, Agile development, Oracle 11g/10g, UNIX Shell scripting, Ambari, TezEclipse and Qlik sense, Cloudera.
Hadoop Developer
Confidential, Dallas, TX
Responsibilities:
- Used Cassandra Query Language to design Cassandra database and tables with various configuration options.
- Developed PIG UDF'S for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
- Experience in integrating Hadoop with Ganglia and have good understanding of Hadoop metrics and visualization using Ganglia.
- Involved in the review of functional and non-functional requirements.
- Practical experience in developing Spark applications in Eclipse with Maven.
- Strong understanding of Spark real time streaming and SparkSQL.
- Loading data from external data sources like MySQL and Cassandra for Spark applications.
- Developed Python and Shell scripts to automate the end-to-end implementation process of AI project.
- Experience in selecting and configuring the right Amazon EC2 instances and access key AWS services using client tools and AWS SDKs.
- Knowledge on using AWS identity and Access Management to secure access to EC2 instances and configure auto-scaling groups using CloudWatch.
- Firm understanding of optimizations and performance-tuning practices while working with Spark.
- Good knowledge on compression and serialization to improve performance in Spark applications
- Performed interactive querying using SparkSQL.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Strong expertise on MapReduce programming model with XML, JSON, CSV file formats.
- Designed and Implemented Partitioning (static, Dynamic) and Bucketing in HIVE .AWS
- Practical knowledge on Apache Sqoop to import datasets from MySQL to HDFS and vice-versa.
- Good knowledge on building predictive models focusing on customer service using R programming.
- Experience in reviewing and managing Hadoop log files.
- Experience in building batch and streaming applications with Apache Spark and Python.
- Used the libraries built on Mlib to perform data cleaning and used R programming for dataset reorganizing
- Debug CQL queries and implement performance enhancement practices.
- Strong knowledge on Apache Oozie for scheduling the tasks.
- Practical knowledge on implementing Kafka with third-party systems, such as Spark and Hadoop.
- Experience in configuring Kafka brokers, consumers and producers for optimal performance.
- Knowledge of creating Apache Kafka consumers and producers in Java.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Practical knowledge of monitoring a Hadoop cluster using Nagios and Ganglia.
- Experience with GIT for version control system.
- Involved in loading data from UNIX file system to HDFS and developed UNIX scripts for job scheduling, process management and for handling logs from Hadoop.
- Understanding technical specifications and documenting technical design documents.
- Strong skills in agile development and Test-Driven development.
- Have practical knowledge on implementing Internet of Things (IoT)
Environment: Hadoop Cloudera Distribution (CDH4), Java 7, Hadoop 2.5.2, Spark, SparkSQL, Mlib, R programming, Scala, Cassandra, IoT, MapReduce, Apache Pig 0.14.0, Apache Hive 1.0.0, HDFS, Sqoop, Oozie, Kafka, Maven, Eclipse, Nagios, Ganglia, Zookeeper, AWS EC2, GIT, Ambari, TezEclipse, UNIX Shell scripting, Oracle 11g/10g, Linux, Agile development.
Hadoop Developer
Confidential, Newport Beach, CA
Responsibilities:
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Designed and implemented Incremental Imports into Hive tables.
- Developed and written Apache PIG scripts and HIVE scripts to process the HDFS data.
- Involved in defining job flows, managing and reviewing log files.
- Involved in Unit testing and delivered Unit test plans and results documents using Junit and MR unit.
- Supported Map Reduce Programs those are running on the cluster.
- As a Big Data Developer, implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, HBase, Hive, Oozie, Flume, Sqoop etc.
- Imported Bulk Data into HBase Using Map Reduce programs.
- Perform analytics on Time Series Data exists in HBase using HBase API.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Responsible for continuous monitoring and managing Elastic MapReduce cluster through AWS console.
- Wrote multiple java programs to pull data from HBase.
- Involved with File Processing using Pig Latin.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Experience in optimization of Map reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS cluster.
- Worked on debugging, performance tuning of Hive & Pig Jobs
- Used Hive to find correlations between customer's browser logs in different sites and analyzed them to build risk profile for such sites.
- Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
Environment: Java, Hadoop 2.1.0, Map Reduce2, Pig 0.12.0, Hive 0.13.0, Linux, Sqoop 1.4.2, Flume 1.3.1, Eclipse, AWS EC2, and Cloudera CDH 4.
Java Developer
Confidential
Responsibilities:
- Involved in Analysis, Design, Implementation and Bug Fixing Activities.
- Designing the initial Web-WAP pages for a better UI as per the requirement.
- Involved in design of basic Class Diagrams, Sequence Diagrams and Event Diagrams as a part of Documentation.
- Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
- Involved in Functional & Technical Specification documents review and the code review.
- Undergone training on the Domain Knowledge.
- Discussions and meetings with the Business Analysts for understanding the functionality involved in Test Cases review.
- Prepared the Support Guide containing the complete functionality.
Environment: Core Java, ApacheTomcat5.1, Oracle 9i, Java Script, HTML, PL/SQL, Rational Rose, Windows XP, UNIX.
Software Engineer
Confidential
Responsibilities:
- Designed, developed and executed Data Migration from Db2 Database to Oracle Database using Linux scripts, Java and SQL loader concepts.
- A key member of the team and playing a key role in articulating the Design requirements for the Development of Automated tools that perform error free Configuration.
- Developed UNIX and java utilities for Data migration from Db2 to Oracle. Sole developer and POC for the migration Activity.
- Developed JSP pages, Servlets and HTML pages as per requirement.
- Developed the necessary Java Beans, PL/SQL procedures for the implementation of business rules.
- Developed user interface using JAVA Server Pages (JSP), HTML and Java Script for the Presentation Tier.
- Developed JSP pages and client-side validation by java script tags.
- Developed an own realm for Apache Tomcat Server for authenticating the users.
- Developed front end controller in Servlet to handle all the requests.
- Developed the web interface using JSP and developed struts action classes.
- Responsible for both functional and non-functional requirements gathering, performing impact analysis and testing the solutions build on build basis.
- Coding using Java, Java Script and HTML.
- Used JDBC to provide database connectivity to database tables in Oracle.
- Used WebSphere Application Server for application deployment.
- Implemented Software Development Life Cycle (Requirements Analysis, Design, Development, Testing, Deployment and Support).
Environment: J2EE, IBM DB2, IBM WebSphere Application Server, EJB, JSP, Servlets, HTML, CSS, JavaScript, Oracle database, Unix Scripting and Windows 2000.
