Hadoop Developer Resume
Miami, FL
SUMMARY:
- 8+ Years of working experience in IT Development, Enhancements, Production Support, Administration and Development including 6 years of Big Data Ecosystem related technologies.
- Over all two years of hands on experience using Spark framework with Scala.
- 4 years of Java programming experience in developing web - based applications and Client- Server technologies.
- Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
- Strong Knowledge and development experience in Hadoop and Big Data Ecosystem including MapReduce, HDFS, Hive, Pig, Spark, Cloudera Navigator, Mahout, HBase, Zookeeper, Kafka, Strom, Sqoop, Flume, Oozie and Impala.
- Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
- Strong experience as a Java Developer in Web/intranet, client/server technologies using Java, J2EE, Servlets, JSP, JSF, EJB, JDBC, SQL, Hibernate framework and Spring frame work.
- Experience in NoSQL Column-Oriented Databases like HBase and its Integration with Hadoop cluster.
- Experience in Amazon AWS services such as EMR, EC2, S3, Cloud Formation, RedShift which provides fast and efficient processing of Big Data.
- Extensive experience in data ingestion, big data storage planning, complex transformations, data integration, analysis for Pharmaceutical, Healthcare and Retail sectors.
- Experience with scripting languages (SQL, Scala, Java, Pig, Bash/Python) to manipulate data.
- Procedural knowledge in cleansing and analyzing data using Hive QL, Pig Latin, and custom MapReduce programs in Java.
- Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files.
- Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing. Experience optimizing ETL workflows.
- Expertise with using Oracle, MySQL, DB2 databases and writing highly complex SQL queries.
- Experienced in using Integrated Development environments like Eclipse, NetBeans, IntelliJ, Spring Tool Suite.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Developed simple to complex MapReduce jobs using Java language.
- Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase, and Cassandra.
- Experience in Data warehousing, Data Extraction, Transformation and loading (ETL) data from various sources like Oracle, Teradata, DB2, Microsoft Excel and Flat files into Data Warehouse and Data Marts using Informatica Power Center.
- Excellent working knowledge of popular frameworks like Struts, Hibernate, and Spring MVC.
- Experience in Agile Engineering practices.
- Good experience working with Hortonworks Distribution and Cloudera Distribution.
- Involved in upgrading existing MongoDB instances from version 2.4 to version 2.6 by upgrading the security roles and implementing newer features.
- Experienced in providing technical solutions to the business on applications that are developed on Hadoop and Its eco-systems. Experience in cloud platforms like AWS, AZURE.
- Handled importing the data from various data sources and performed transformation using Java, Hive, Pig, Yarn, HBase, Sqoop, Oozie, Flume, Windows Azure, Zookeeper.
- Knowledge of data warehousing and ETL tools like Informatica, Talend and Pentaho.
- Analyzed large amount of data with formats including XML, Json and Relational files from different Data Sources.
- Experienced in different application servers like JBoss/Tomcat, WebLogic and IBM WebSphere.
- Worked with efficient storage formats like PARQUET, AVRO and ORC integrated them with Hadoop and the ecosystem (Hive, Impala, and spark). Also used compressions like Snappy and Z lib.
- Building, publishing customized interactive reports and dashboards, report scheduling using Tableau server.
- Created Hive external tables, uploading Data through HDFS into Hive External Tables.
- Experience in extending HIVE and PIG core functionality by using custom UDF’s and UDAF’s.
- Good experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
- Good understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Waterfall and Agile.
- Ability to work in high-pressure environments delivering to and managing stakeholder expectations.
- Application of structured methods to: Project Scoping and Planning, risks, issues, schedules and deliverables.
- Strong analytical and problem-solving skills.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills
TECHNICAL SKILLS:
Big Data Echosystem: HDFS, Sqoop, Flume, Hive, Pig, MapReduce, YARN, Oozie, Kafka, Spark, Impala, Storm, Hue, Zookeeper, Parquet, MongoDB, Java, SQL.
Hadoop Distributions: Hortonworks, Cloudera, Amazon, IBM Big Insight
Databases & NOSQL Databases: Oracle, MYSQL, Microsoft SQL Server, HBase and Cassandra, MongoDB.
RDBMS: Teradata, Oracle Pl/SQL, MS SQL Server, MySQL and DB2
Operating Systems: Linux, UNIX, Windows.
Development Methodologies: Agile/Scrum, Waterfall.
Frameworks: Struts, spring and Hibernate
IDE's: Eclipse, Net Beans, GitHub, Jenkins, Maven, IntelliJ, Ambari.
Programming Languages: C, C++, JSE, XML, JSP/Servlets, Spring, HTML, JavaScript, JQuery, Web services, Python, Scala, PL/SQL & Shell Scripting.
ETL Tools: Ab initio, Informatica Power center and Pentaho
PROFESSIONAL EXPERIENCE:
Confidential, Miami, FL
Hadoop developer
Responsibilities:- Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
- Developed custom Input Adaptor utilizing the HDFS File system API to ingest click stream log files from FTP server to HDFS.
- Developed end-to-end data pipeline using FTP Adaptor, Spark, Hive and Impala.
- Used Scala to write code for all Spark use cases.
- Implemented design patterns in Scala for the application.
- Implemented Spark using Scala utilized Spark SQL heavily for faster development, and processing of data.
- Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Java and Scala.
- Used Scala collection framework to store and process the complex consumer information.
- Implemented a prototype to perform Real time streaming the data using Spark Streaming with Kafka.
- Handled importing other enterprise data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HBase tables.
- Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis. Analyzed the data by performing Hive queries (Hive QL) and running Pig scripts (Pig Latin) to study customer behavior.
- Implemented Data Ingestion in real time processing using Kafka.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to Cassandra.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data; Worked under MapR Distribution and familiar with HDFS
- Created components like Hive UDFs for missing functionality in HIVE for analytics.
- Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive and Map Side joins.
- Created validate and maintain scripts to load data using Sqoop manually.
- Created Oozie workflows and coordinators to automate Sqoop jobs weekly and monthly.
- Installed and configured Apache Hadoop, Hive and Pig environment on AWS.
- Implemented POC Spark Cluster on AWS.
- Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume.
- Tibco Jasper Soft studio was used for the i-report analysis using AWS cloud.
- Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Used Oozie and Oozie coordinators to deploy end-to-end data processing pipelines and scheduling the workflows.
- Continuous monitoring and managing the Hadoop cluster.
- Used JUnit framework to perform Unit testing of the application.
- Developed interactive shell scripts for scheduling various data cleansing and data loading process.
- Performed data validation on the data ingested using Spark by building a custom model to filter all the invalid data and cleanse the data.
- Experience with data wrangling and creating workable datasets.
Environment: HDFS, Pig, Hive, Sqoop, Flume, Spark, MapReduce, Scala, Tableau, Oozie, Cassandra, YARN, UNIX Shell Scripting, Agile Methodology
Confidential, Mansfield, MA
Hadoop Developer
Responsibilities:- Developed MapReduce jobs in Java API to parse the raw data and store the refined data.
- Developed custom Pig & Hive UDFs in Java for transforming the data as per the requirement for analysis.
- Supported Map Reduce Programs that are running on the cluster.
- Worked on Tableau for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
- Integrated the Hive warehouse with HBase.
- Cluster monitoring, maintenance and troubleshooting.
- Created staging tables and ingested data as dynamic partitions in Hive.
- Imported data using Sqoop from the MySQL tables.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Involved in loading data from UNIX file system to HDFS.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Implemented Oozie workflow for automating the complete process.
- Managing and scheduling jobs on a Hadoop cluster
- Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
- Worked with Avro Data Serialization system to work with JSON data formats.
- Performed data validation on the data ingested using MapReduce by building a custom model to filter all the invalid data and cleansing the data.
- Created HBase tables of large data sets of structured and unstructured data.
- Used Flume to collect, aggregate, and store the log data from different web servers.
- Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
- Responsible to manage data coming from different sources.
- Managed and reviewed Hadoop log files.
- Mentored analyst team in writing required Hive Queries.
Environment: OEM, CDH 5.0.6, HDFS, Scala, MapReduce, HBase, Tableaus, Java, Python Hive, Flume, Pig, SQL and Sqoop.
Confidential, Flint, MI
Hadoop Developer
Responsibilities:- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Involved in Installing, Configuring Hadoop Eco System, and Cloudera Manager using CDH3 Distribution.
- Good understanding and related experience with Hadoop stack - internals, Hive, Pig and MapReduce.
- Wrote MapReduce jobs in python to discover trends in data usage by users.
- Involved in defining job flows.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Involved in managing and reviewing Hadoop log files.
- Implemented HBase Co-processors to notify Support team when inserting data into HBase Tables.
- Involved in running Hadoop streaming jobs to process terabytes of text data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and written Hive QL scripts.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Experienced with working on Avro Data files using Avro Serialization system.
- Solved small file problem using Sequence files processing in Map Reduce.
- Monitor System health and logs and respond accordingly to any warning or failure conditions.
- Performed cluster co-ordination through Zookeeper.
- Used Clear case for version control.
- Used JUnit for unit testing and Continuum for integration testing.
Environment: Hadoop, MapReduce, HDFS, Hive, PIG, Linux/Ubuntu, Java (jdk1.6), Datamining, Hadoop distribution of Cloudera, HBase, Map R, UNIX Shell Scripting, Clear case, JUnit.
Confidential, Madison, WI
Big Data Developer
Responsibilities:- Worked in Loading and transforming large sets of structured, semi structured and unstructured data.
- Experienced in managing and reviewing Hadoop log files.
- Imported Data from Different Relational Data Sources like RDBMS, Teradata to HDFS using Sqoop.
- Worked on writing transformer/mapping Map-Reduce pipelines using Apache Crunch and Java.
- Imported Bulk Data into Cassandra file system Using Thrift API.
- Involved in creating Hive Tables, loading data &writing Hive queries to invoke & run MapReduce jobs.
- Perform analysis on Time Series Data in Cassandra using Java API.
- Designed and implemented Incremental Imports into Hive tables
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Performed Hive jobs to parse logs and structure them in tables to get effective querying on the log data.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Worked with Avro Data Serialization system to work with JSON data formats.
- Worked on different file formats like Sequence files, XML files and Map files using MapReduce programs.
- Involved in Unit testing and delivering Unit test plans &results documentation using Junit and MR-Unit.
- Exported data from HDFS to RDBMS using Sqoop for report generation and data visualization purpose.
- Developed scripts and automated data management from end to end and sync up between all the clusters.
- Created and maintained Technical documentation for launching Hadoop Cluster.
Environment: Hadoop, HDFS, MapReduce, Hive, Oozie, Sqoop, Pig, MySQL, Java, Rest API, Maven, MR Unit, Junit
Confidential
Java Developer
Responsibilities:
- Responsible for developing and modifying the existing service layer based on the business requirements.
- Involved in designing & developing web-services using SOAP and WSDL.
- Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle 9i.
- Involved in integration testing the Business Logic layer and Data Access layer.
- Designed user-interface and checking validations using JavaScript.
- Involved in design of JSP's and Servlets for navigation among the modules.
- Designed cascading style sheets and XML also performed client-side validations with java script using Business Objects, XML, and JDBC.
- Developed various EJBs for handling business logic and data manipulations from database.
- Integrated JSF with JSP and used JSF Custom Tag Libraries to display the value of variables defined in configuration files.
- Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components.
- Involved in JUnit testing of the application using JUnit framework
- Used Maven build to wrap around Ant build scripts
- SVN Tortoise is used for version control of code and project documents.
Environment: Eclipse, JQuery, JSP, Servlets, JSF, JDBC, HTML, JUnit, JavaScript, XML, SQL, Maven, Web Services, UML, WebLogic Workshop and SVN.
