Sr.hadoop Developer Resume
Atlanta, GA
PROFESSIONAL SUMMARY:
- Over all 8 years of experience in IT industry, which includes 3 years of experience in Hadoop Ecosystem.
- Hands on experience using various Hadoop components such as MapReduce, Pig, Hive, Impala, Zookeeper, HBase, Sqoop, Oozie, Flume and SOLR for data extraction, storage and analysis.
- Understanding in building, maintaining multiple HADOOP clusters and setting up the rack topology for large clusters. Good experience with setting up and configuring a Hadoop cluster on cloud infrastructure like Amazon web Services (EC2, EMR, and S3).
- Apart from developing on the Hadoop ecosystem, also have good experience in installing and configuring of the Cloudera's distribution and Hortonworks distribution.
- Good Understanding of Hadoop architecture and Hands - on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
- Experience in developing PigLatin and HiveQL scripts for Data Analysis & ETL purposes and also extended the default functionality by writing custom User Defined Functions (UDFs) for data specific processing.
- Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
- Extracted & processed streaming log data from various sources and integrated in to HDFS using Flume.
- Good knowledge in job scheduling and monitoring through Oozie, Amazon Data Pipeline and Zookeeper.
- Familiarity with NoSQL databases like HBase and Cassandra.
- Extensively worked with different data sources non-relational databases such as Flat files, XML files, and other relational sources such as Oracle, MySQL, Sybase and DB2.
- Experience with writing Apache Spark programs to process and analyze data. Familiarity on real time streaming data with Spark & Kafka for fast large scale in-memory MapReduce.
- Strong understanding of statistical and business intelligence concepts with project experience on data analysis using Excel, SAS, R. Good understanding of Data mining concepts.
- Experience in Database design, Entity relationships, Database analysis, Programming SQL, Stored procedure’s PL/ SQL, Packages and Triggers in Oracle and SQL Server on Windows and LINUX. Good knowledge in tuning the performance of SQL queries and ETL process. Also, experienced in working with tools like TOAD, SQL Server Management studio and SQL plus for development and customization.
- Detailed knowledge and experience of Design, Development and Testing Software solutions using Java and J2EE technologies.
- Experience in Building, Deploying and Integrating with Ant, Maven.
- Proficiency in programming with different IDE's like Eclipse, NetBeans JDeveloper and IntelliJ IDEA.
- Experience with various version control systems like CVS, SVN and Git.
TECHNICAL SKILLS:
Big Data Technologies: Hadoop, HDFS, Hive, MapReduce, Pig, Sqoop, Flume, Zookeeper, Cloudera, Amazon EC2, EMR,S3, Redshift
Reporting Tools: Jaspersoft, Qlik Sense, Tableau
Scripting Languages: Perl, Shell, R
Programming Languages: C, C++, Java
Web Technologies: HTML, J2EE, CSS, JavaScript, AJAX, Servlets, JSP, DOM, XML, XSLT.
Application Server: WebLogic Server, Apache Tomcat.
DB Languages: SQL, PL/SQL, Postgres, Paraccel.
NoSQL Databases: Hbase, Cassandra
Databases /ETL: Oracle 9i/10g/11g, MySQL 5.2, DB2, Informatica v 8.x, Talend
Operating Systems: Linux, UNIX, Windows 2003 Server
IDE’s: Eclipse, NetBeans JDeveloper, IntelliJ IDEA.
Version Control: CVS, SVN, Git
PROFESSIONAL EXPERIENCE:
Confidential, Atlanta, GA
Sr.Hadoop Developer
Responsibilities:
- Involved in all phases of development activities from requirements collection to production support.
- Migrated from different RDBMS system and focused on migrating from Cloudera distribution to Amazon to reduce project cost.
- Worked with different feeds data like JSON, CSV, XML and implemented data lake concept.
- Understanding the current system and find out the different sources of data.
- Assist in managing, acquiring, and analyzing customer data using SQL and R.
- Predictive analytics (which can monitor inventory levels and ensure product availability).
- Performed Batch processing of logs from various data sources using MapReduce.
- Had an exposure to Amazon Web Services - AWS cloud computing (EMR, EC2 and S3 services).
- Created the Load Balancer on AWS EC2 for unstable cluster.
- As a part of POC, used the Amazon AWS S3 as an underlying file system for the Hadoop and implemented the elastic Map-Reduce jobs on the data in S3 buckets. Backing up HBase data to Amazon S3 at off-peak hours.
- Providing pivotal graphs in order to show the trends.
- Creating Spark SQL queries for faster requests.
- Did POC, Spark integration with Kafka and Cassandra using Java API’s which could be used for customer analytics platform.
- Response to value-added services based on clients' profiles and purchasing habits.
- Build customized memory indexes for high performance information retrieval of products using Apache Lucene and Apache Solr, which provides more precise and useful search data.
- Defined UDFs using PIG and Hive in order to capture customer behavior.
- Design and implement MapReduce jobs to support distributed processing using java, Hive and Apache Pig.
- Create Hive external tables on the MapReduce output before partitioning, bucketing is applied on top of it.
- Maintenance of data importing scripts using Hive and MapReduce jobs.
- Worked on hive data warehouse modeling to interface with BI tools including Jaspersoft, Qlikview and Tableau.
- Administered hive permissions & user access with Kerberos authentication.
- Develop and maintain several batch jobs to run automatically depending on business requirements.
- Import and export data between the environments like MySQL, HDFS and deploying into productions.
- Connect tableau from client end with AWS IP addresses and view the end results.
- Developed dashboards, reports, adhoc views, domains in Jasper soft server & Tableau for business/stakeholders.
- Lead project teams on Big Data Analytics using Hadoop framework to analyze massive volumes of unstructured biological data to build a predictive tools which enabled various data scientist to make real time decisions.
Environment: EMR, Hive, PIG, Data Meer, HDFS, Solr, Quartz, Java Map-Reduce, Maven, Core Java, GIT, Jenkins, UNIX, R, MYSQL, Eclipse, Oozie, Sqoop, Flume, jaspersoft, tableau, qlikview, Cloudera, EMR,EC2,S3,Amazon data pipeline.
Confidential - Harrisburg, PA
Hadoop Developer
Responsibilities:
- Created Hive Tables, loaded retail transactional data from Teradata using Sqoop.
- Loaded home mortgage data from the existing DWH tables (SQL Server) to HDFS using Sqoop.
- Wrote Hive Queries to have a consolidated view of the mortgage and retail data. Also developed Solr bolt to write and index documents for lightning fast search.
- Data is loaded back to the Teradata for the reporting and for the business users to analyze and visualize the data using Datameer.
- Orchestrated hundreds of sqoop scripts, pig scripts, hive queries using oozie workflows and sub-workflows.
- Loaded the load ready files from mainframes to Hadoop and files were converted to ASCII format.
- Configured Hive Server (HS2) to enable analytical tools like Tableau, Qlikview and SAS to interact with Hive tables.
- Developed pig scripts for replacing the existing home loans legacy process to the Hadoop and the data is back fed to retail legacy mainframes systems.
- Developed MapReduce programs to write data with headers and footers and Shell scripts to convert the data to fixed-length format suitable for Mainframes CICS consumption.
- Used Maven for continuous build integration and deployment.
- Used Hive to find correlations between customer’s browser logs in different sites and analyzed them to build risk profile.
- End-to-end performance tuning of Hadoop clusters and Hadoop Map/Reduce routines against very large data sets.
- Exposure to burn-up, burn-down charts, dashboards, velocity reporting of sprint and release progress.
Environment: Hive, PIG, HDFS, Java Map-Reduce, Solr, Core Java, UNIX, Eclipse, Oozie, Sqoop, Flume
Confidential, Atlanta, GA
Hadoop Developer
Responsibilities:
- Responsible for loading the customer’s data and event logs from MSMQ into HBase using REST API.
- Responsible for architecting Hadoop clusters with CDH4 on CentOS, managing with Cloudera Manager.
- Monitor the AWS Hadoop cluster using Cloudera manager for adding nodes and decommission dead nodes and to monitor heal checks.
- Involved in initiating and successfully completing Proof of Concept on FLUME for Pre-Processing, Increased Reliability and Ease of Scalability over traditional MSMQ.
- Involved in loading data from LINUX file system to HDFS.
- Importing and exporting data into HDFS and Hive using Flume.
- Developed the Pig UDF’S to pre-process the data for analysis
- Monitored Hadoop cluster job performance, performed capacity planning and managed nodes on Hadoop cluster.
- Used Zookeeper operational services for coordinating cluster and scheduling workflows.
- Proficient in using Cloudera Manager, an end to end tool to manage Hadoop operations.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
- Wrote MR jobs using various Input and Output formats. Also used custom formats whenever necessary. For example, written MR jobs in R using RHIPE(R & Hadoop Integrated environment).
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing, analyzing and the classifier using MapReduce jobs, Pig jobs and Hive jobs.
Environment: Cloudera Distribution, CDH4, FLUME, HBase, HDFS, Pig, R, MapReduce, Hive, Oozie and Zookeeper.
Confidential, Dallas, TX
Sr.Java & ETL Developer
Responsibilities:
- Involved in requirement gathering, functional and technical specifications.
- Involved in modules like Estate Management, HSS (Health Service Scheme), Uniform, PIS (rmation System) & Generics for Digital Work Flow System (DWFS) project.
- Contributing to the enhancement and maintenance of the Application as per the business requirements.
- Fixing the existing bugs in various releasesand providing a detailed Root Cause Analysis.
- Global deployment of the application and co-ordination between the client, development team and the end users.
- Setting up of the users by reconciliations, bulk load and bulk link in all the environments.
- Wrote requirements and detailed design documents, designed architecture for data collection.
- Developed web-based UI using MVC architecture, Core Java, Java Collections, JSP, JDBC, Servlets, ANT and XML within a Windows and UNIX environment.
- Increased the performance of application by optimizing Oracle with Java and J2EE technologies.
- Involved in Merging the Code using Beyond Compare tool.
- Used Log4J logging for logging of messages.
- Played a major role in optimization of application to enable it to high volume of traffic.
- Involved in generating the Jasper reports.
- Developed test cases and performed all types of testing which includes Unit testing & Integration testing. Did QA with test methodologies and skills for manual/automated testing using tools like WinRunner, JUnit.
- Worked with Informatica 8.6x and above (Source Analyzer, Mapping Designer, Mapplet Designer, Transformations Designer, Warehouse Designer, Repository Manager, and Workflow Manager/Server Manager). Learnt Talend on special interest and used it for the project to make them easy.
Environment: JAVA, JSP, JDBC, Servlets, Oracle, SQL/ PL SQL, Ant, JUnit, Informatica.
Confidential
Java developer and ETL Developer
Responsibilities:
- Involved in designing, coding, debugging, documenting and maintaining a number of applications.
- Participated in Java and development as a part of cross program.
- Prepared use cases and designed class diagrams and object models.
- Created tables, indexes, stored procedures in SQL for data manipulation & retrieval and Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle 9i/10g.
- Developed user interface using HTML, CSS, JavaScript, jQuery and Java Server Faces UI component framework.
- Integrated JSF with JSPs, JSTL, EL and used JSF Custom Tag Libraries to display the value of variables defined in configuration files.
- Developed POJOs and Java beans to implement business logic.
- Implemented controllers layer using servlets and JSPs.
- Involved in integration testing the Business Logic layer and Data Access layer.
- Used JDBC to establish connection between the database and the application.
- Used XML for mapping the pages and classes and to transfer data universally among different data sources.
- Responsible for developing and modifying the existing service layer based on the business requirements.
- Involved in designing & developing web-services using SOAP and WSDL.
- Written build scripts with Ant for deploying war and ear applications.
- Created/modified shell scripts for scheduling and automating tasks.
- Used Git as version control system to manage the progress of the project.
- Wrote unit test cases using Junit framework.
- Handled requirements and worked in an agile process.
- Familiar with ETL Standards and Process and developed ETL logic as per standards from Source-Flat File, Flat-File-Stage, Stage-Work, Work-Work Interim tables and Work Interim tables- Target Tables.
- Prepared ETL Scripts for Data acquisition and Transformation. Developed the various mappings using transformation like source qualifier, joiner, filter, router, Expression and lookup transformations etc.
- Implemented History Load, Incremental Load and ETL logic using Informatica 8.6/9.1 as per the ETL design document and Technical design document. Doing Analysis for existing ETL Jobs and understanding the flow.
- Developed the Audit reports at various points of the migration to make sure the accounts receivables and payments are getting matched at every stage.
Environment: Java (Jdk 1.6), UML, Eclipse, Servlets, JSPs, POJO, Java Beans, JUnit, HTML, CSS, JavaScript, JQuery, XML, SQL, Maven, Web Services, JDBC, JSTL, Oracle 9i/10g, Informatica, Unix, Git.
