Hadoop Developer Resume
Plano, TexaS
SUMMARY
- Around 8 years of experience in Analysis, Architecture, Design, Development, Testing, Maintenance, and User training of software application
- Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per requirement.
- Good working knowledge on Data Transformations and Loading using Export and Import.
- Hands on experience using Sqoop to import data into HDFS from RDBMS and vice - versa.
- Used different Hive Serde's like Regex Serde and HBase Serde.
- Experience in analyzing data using Hive, Pig Latin, and custom MR programs in Java.
- Hands on experience in writing Spark SQL scripting.
- Sound knowledge in programming Spark using Scala.
- Good understanding in processing of real-time data using Spark.
- Hands on using job scheduling and monitoring tools like Kafka, Oozie and Zookeeper.
- Developed small distributed applications in our projects using Zookeeper and Scheduled the work flows using Oozie.
- Developed REST APIs using Java, Play framework and Akka
- Experience in maintaining the big data platform using open source technologies such as Spark and Elastic Search.
- Used play-json library to interact with micro services
- Experienced data pipelines using Kafka and Akka for handling large terabytes of data.
- Used Pig as ETL tool to do transformations, event joins, filter, and some pre-aggregation.
- Clear understanding onHadoop architecture and various components such as HDFS, Job and Task Tracker, Name and Data Node, Secondary Name Node and Map Reduce programming.
- Expertise writing custom UDFs for extending Hive and Pig core functionality.
- Hands on dealing with log files to extract data and to copy into HDFS using flume.
- Experience in NOSQL database such as HBase.
- Experience inHadoopadministration activities such as installation and configuration of clusters using Apache and Cloudera.
- Experience and solid understanding of Micro Services.
- Hands on experience in installing, configuring the Hadoop ecosystem components such as MapReduce, HDFS, Pig, Hive, Sqoop, Flume,Knox,Tez,Storm,Kafka, Oozie, HBase using Ambari and Ambari Blueprints.
- Knowledge on installing, configuring and using Hadoop components like Hadoop Map Reduce(MR1), YARN(MR2), HDFS, Hive, Pig, Flume and Sqoop.
- • Experience in MapR, Cloudera, & EMR Hadoop distributions.
- Experience in analyzing, designing and developing ETL strategies and processes, writing ETL specifications, Informatica development.
- Migrated DataStage 8.7 to Talend for ETL with Hadoop.
- Extensively used Informatica Power Center for Extraction, Transformation and Loading process.
- Experience in Dimensional Data Modeling using Star and Snow Flake Schema.
- Worked on reusable code known as Tie outs to maintain the data consistency.
- Domain Exposure - Banking, Healthcare and Telecom.
- More than 4 years of experience in JAVA, J2EE, Web Services, SOAP, SOA, HTML and XML related technologies demonstrating strong analytical and problem solving skills, computer proficiency and ability to follow through with projects from inception to completion.
- Experience in building web applications using spring framework features like Spring ORM, Spring MVC, Spring DAO, Spring AOP, Spring Context, Spring Security, Spring Core, Spring IOC, Spring Batch and Web Services using Eclipse and integration with Hibernate as well as Struts
- Extensive experience working in Oracle, DB2, SQL Server and My SQL database and Java Core concepts like OOPS, Multithreading, Collections and IO.
- Loading big data with MarkLogic Content Pump or other tools.
- Worked with RDBMS Teradata and utilities like Fastload, Multiload, Tpump and Fastexport
- Hands on JAXWS, JSP, Servlets, Struts, Web Logic, Web Sphere, Hibernate, Spring, JBoss, JDBC, RMI, Java Script, Ajax, jQuery, Linux, Unix, WSDL, XML, HTML, AWS and Scala and Vertica.
- Experience in migrating on premise to Windows Azure in DR on cloud using Azure Recovery Vault and Azure backups.
- Expertise in developing Teradata SQL Scripts through various procedures, functions, and packages to implement the business logics.
- Advance Xqueries using MarkLogic API.
- Design, develop, validate and deploy the Talend ETL processes for the DWH team using HADOOP (PIG, HIVE) on Hadoop.
- AWS EMR, EC2, Data Pipeline, SNS, Redshift, AWS Cli..
- Configuration of Internal load balancer, load balanced sets and Azure Traffic manager.
- Developed applications using Java, RDBMS, and Linux shell scripting.
- Experience in complete project life cycle of Client Server and Web applications.
- Good understanding of Data Mining and Machine Learning techniques.
- Run MarkLogic at AWS.
- Have good interpersonal, communicational skills, strong problem-solving skills, explore/adopt to new technologies with ease and a good team member.
TECHNICAL SKILLS
Big Data/ Hadoop Framework: HDFS, MapReduce, Pig, Hive, Sqoop, Oozie, Zookeeper, Flume and HBase, Spark
Databases: MongoDB, Microsoft SQL Server, MySQL, Oracle, Cassandra, ODI
Languages: Scala, Java, Python, C, C++, SQL, TSQL, Pig Latin, HiveQL
Web Technologies: JSP, JavaBeans, JDBC, XML
Operating Systems: Windows, Unix and Linux
Front-End: HTML/HTML 5, CSS3, JavaScript/JQuery
Development Tools: Microsoft SQL Studio, Toad, Eclipse, NetBeans, MySQL Workbench, Tableau.
Reporting Tool: SSRS, Succeed
Office Tools: Microsoft Office Suite
Development Methodologies: Agile/Scrum, Waterfall
PROFESSIONAL EXPERIENCE
Confidential, Plano, Texas
Hadoop Developer
Responsibilities:
- Involved in end to end data processing like ingestion, processing, and quality checks and splitting.
- Real time streaming the data using Spark Streaming with Kafka
- Developed Spark scripts by using Scala as per the requirement.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Performed different types of transformations and actions on the RDD to meet the business requirements.
- Developed a data pipeline using Kafka, Sparkand Hive to ingest, transform and analyzing data.
- Also worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase and Sqoop.
- Involved in loading data from UNIX file system to HDFS.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Implemented best offer logic using Pig scripts and Pig UDFs.
- Responsible to manage data coming from various sources.
- Installed and configured Hive and written Hive UDFs.
- Created Mavenarchetypes for generating fully functional REST web services supporting both XML and JSON message transformation. Archetypes built on spring technology.
- Worked on making AJAX calls to connect database using RESTFUL web APIs and worked on integrating the middleware to front-end.
- Experience in implementing Auto Complete/Auto Suggest functionality using Ajax, JQuery, DHTML, Web Service and JSON.
- Worked as part of Microservices team to develop and deliver Maven projects to deploy on Tomcat.
- Core services uses the main database and the other Microservices use their individual databases to access and store data.
- Experience on loading and transforming of large sets of structured, semi structured, and unstructured data.
- Using Hadoop this project helped Cotiviti Healthcare to perform Predictive and Risk analysis
- Cluster coordination services through Zookeeper.
- Experience in migrating on premise to Windows Azure in DR on cloud using Azure Recovery Vault and Azure backups.
- Uploaded Vhd files into Azure Storage Account using AZ Copy.
- Configuration of Internal load balancer, load balanced sets and Azure Traffic manager.
- Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Analysed large amounts of data sets to determine optimal way to aggregate and report on it.
- Responsible for setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Creation of data extract by connecting to Teradata and the files will be placed on Tableau server.
- Expertise in developing Teradata SQL Scripts through various procedures, functions, and packages to implement the business logics.
- Installed and configured Hadoop Mapreduce, HDFS.
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Installed and configured Pig.
- Writing MarkLogic triggers to log data receipt.
- Writing MarkLogic scheduled tasks for sending batches of data.
- Involved in managing and reviewing Hadoop log files.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Responsible for writting Hive queries for data analysis to meet the business requirements.
- Responsible for creating Hive tables and working on them using Hive QL.
- Responsible for importing and exporting data into HDFS and Hive using Sqoop.
- Involved in creating Hive tables, loading with data, and writing hive queries which will run internally in map reduce way.
- Designed and implemented Mapreduce based large-scale parallel relation-learning system.
- Involved in scheduling Oozie workflow engine to run multiple Hive jobs
Environment: Hadoop, MapReduce, Hive, Pig, Sqoop, Java, Oozie, HBase, Kafka, Spark, Scala, Eclipse, Linux, Oracle, Teradata.
Confidential, Houston, TX
Hadoop Developer
Responsibilities:
- Involved in review of functional and non-functional requirements.
- Facilitated knowledge transfer sessions.
- Installed and configured Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experienced in defining job flows.
- Experienced in managing and reviewing Hadoop log files.
- Extracted files from RDBMS through Sqoop and placed in HDFS and processed.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from various sources.
- Got good experience with NOSQL database such as HBase
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
- Developed a custom File System plug in for Hadoop so it can access files on Data Platform.
- This plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
- Designed and implemented Mapreduce-based large-scale parallel relation-learning system
- Written the programs in Spark using Scala and used RDD for transformations and performed actions on them.
Environment: Java 6, Eclipse, Oracle 10g, Linux Red Hat. Linux, MapReduce, HDFS, Hive, Java (JDK 1.6), MapReduce, Spark, Oracle 11g / 10g, PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting.
Confidential, SanMateo, CA
Hadoop Developer
Responsibilities:
- Installation and Configuration of Hadoop Cluster
- Working with Cloudera Support Team to Fine tune Cluster
- Working Closely with SA Team to make sure all hardware and software is properly setup for Optimum usage of resources
- Developed a custom File System plugin for Hadoop so it can access files on Hitachi Data Platform
- Plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly
- The plugin also provided data locality for Hadoop across host nodes and virtual machines
- Wrote data ingesters and map reduce program
- Developed map Reduce jobs to analyze data and provide heuristics reports
- Good experience in writing data ingesters and complex MapReduce jobs in java for data cleaning and preprocessing and fine tuning them as per data set
- Extensive data validation using HIVE and also written Hive UDF
- Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce wa lots of scripting (python and shell) to provision and spin up virtualized hadoop clusters
- Adding, Decommissioning and rebalancing node
- Created POC to store Server Log data into Cassandra to identify System Alert Metrics
- Rack Aware Configuration
- Configuring Client Machines
- Configuring, Monitoring and Management Tools
- HDFS Support and Maintenance
- Cluster HA Setup
- Applying Patches and Perform Version Upgrades
- Incident Management, Problem Management and Change Managemen
- Performance Management and Reporting
- Recover from Name Node failure
- Schedule Map Reduce Jobs - FIFO and FAIR share
- Installation and Configuration of other Open Source Software like Pig, Hive, HBASE, Flume and Sqoo
- Integration with RDBMS using swoop and JDBC Connector
- Working with Dev Team to tune Job Knowledge of Writing Hive Jobs
Environment: Windows 2000/ 2003 UNIX Linux Java, Apache HDFS Map Reduce, Pig Hive HBase Flume Sqoop, Cassandra, NOSQL
Confidential
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
- Installed Hadoop, MapReduce, HDFS, and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
- Coordinated with business customers to gather business requirements, also interacted with other technical peers to derive technical requirements and delivered the BRD and TDD documents.
- Involved in loading the created HFiles into HBase for faster access of large customer base without taking Performance hit.
- Created HBase tables to store various data formats of PII data coming from different portfolios.
- Extensively involved in Design phase and delivered Design documents.
- Involved in Testing and coordination with business in User testing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables loading data and writing queries that will run internally in MapReduce way.
- Involved in processing ingested raw data using MapReduce, Apache Pig and HBase.
- Involved in developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
- Populated HDFS with huge amounts of data using Apache Kafka.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Used Hive to analyze the partitioned and bucketed data to compute various metrics for reporting.
- POC work is going on using Spark and Kafka for real time processing.
- Design technical solution for real-time analytics using Kafka and HBase.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Implemented test scripts to support test driven development and continuous integration.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data on to HDFS.
- Developed Hadoop streaming Map/Reduce works using Python.
Environment: Hadoop, Hive, MapReduce, Pig, MongoDB, Oozie, Sqoop, Kafka, Cloudera, Spark, HBase, HDFS, Scala, Solr, Zookeeper, HBase.
Confidential
Java developer
Responsibilities:
- Develop the complete website for the company from the scratch and deploy the same
- Involved in requirements gathering.
- Designed and developed user interface using HTML, CSS and JavaScript.
- Designed HTML screens with JSP for the front-end.
- Involved in Database Design by creating Data Flow Diagram (Process Model) and ER Diagram (Data Model).
- Designed, Created and maintained database using MySQL
- Made JDBC calls from the Servlets to the Database to store the user details
- Java Script was used for client side validation.
- Servlets are used as the controllers and Entity/Session Beans for Business logic purpose.
- Used Eclipse for project building
- Participated in User review meetings and used Test Director to periodically log the development issues, production problems and bugs.
- Used WebLogic to deploy applications on local and development environments of the application.
- Debugged and fixed the errors
- Implemented and supported the project through development, Unit testing phase into production environment.
- Involved in documenting the application.
- Designed HTML screens with JSP for the front-end.
- Made JDBC calls from the Servlets to the Database
- Involved in designing stored procedures to extract and calculate billing information connecting to oracle.
- Formatting the results from the Database as HTML reports to the client.
- Java Script was used for client side validation.
- Servlets are used as the controllers and Entity/Session Beans for Business logic purpose.
- Used WebLogic to deploy applications on local and development environments of the application.
- Used Eclipse for building the application.
- Participated in User review meetings and used Test Director to periodically log the development issues, production problems and bugs.
- Implemented and supported the project through development, Unit testing phase into production environment.
- Used PVCS Version manager for source control and PVCS Tracker for change control management
- Implemented Test First unit testing framework driven using Junit.
Environment: Java, JSP, Servlets, JDBC, Java Script, HTML, CSS, WebLogic, Eclipse and Test Director