Sr. Big Data/hadoop Developer Resume
Austin, TX
SUMMARY
- Passionate about IT with 9+ years of professional experience in building scalable and data driven application leveraging vast array of technologies including Mainframe, ETL, Teradata, Java, Analytics and Hadoop and actively seeking for opportunities in Analytics and BI.
- Over 5+ years of Experience in Hadoop, Bigdata eco system - HDFS, MapReduce, Hive, Sqoop, GreenPlum, java and mainframes.
- Proficient in the integration of various data sources and targets involving multiple relational databases like MSSQL Server, DB2, COBOL, XML files and Flat Files (fixed width, delimited), into the staging area of ODS, DataWarehouse or Data Mart.
- Experience in building applications in Mainframes.
- Expert in ingesting data into the BigData eco System
- Working knowledge on ETL and BI tools like Informatica and Teradata
- Working Knowledge on C.
- Hands on experience in application development and database management using the technologies Java, RDBMS, Linux/Unixshell scripting and Linux internals.
- Working Knowledge in Statastical Analytic tools like R.
- Proficient in Map-ReduceJava and streaming APIs,Ruby, Python.
- Have profound knowledge in Retail Merchandise Assortment/Planning Operations/Pricing Operations.
- Have working experience on Agile Methodologies.
- Involved in Analysis, Design, Coding and Development of Java custom Interfaces.
- Working knowledge on CloudComputing and AWS.
- Knowledge in preparing Support Documents, BPD Design Diagram, mapping documents, and requirement traceability matrix.
- Experience in extensive development of environment building and applications for web deployment using Java, J2EE, JDBC, Servlets, JSP, JavaBeans, JavaScript, HTML, XHTML/DHTML&XML, AngularJS 1.3.
- Have good knowledge in Retail Merchandise Assortment and Planning Operations and Pricing Operations.
- Have experience in leading the team.
- Experience in Automations and Process improvements.
- Experiene on tools like Informatica and Teradata.
- Proficient inTeradatadatabase design, application support, performance tuning, optimization, and setting up the test and development environments
- Good Knowledge in Bash shell Scripting.
- Extensive exposure to all aspects of Software Development Life Cycle (SDLC) i.e. Requirements Definition for customization, Prototyping, Coding (COBOL,DB2,Java) and Testing.
- Exposure toJavadevelopment projects.
- Created Custom Monitoring Dashboards for tracking the sales and other important business insights Hands on experience inAWSSupport.
TECHNICAL SKILLS
Operating System: Z/OS, Windows, UNIX, Linux, Solaris .
Scripting Languages: Python, Bash Shell Scripting, SQL,R, JavaScript, Html, Xml AngularJS 1.3, GulpJS, Browserify, JQuery, IDE NetBeans.
Databases: DB2, Teradata, MySQL, Eclipse, Ruby, Microsoft Access, Sql Server.
Big Data Tools: HIVE, PIG, Map Reduce, Sqoop, Spark, HDFS,YARN,Mapreduce, Navigator, TOAD.
Reporting Tools: Business Objects 6.5/6.1, Crystal Reports, Cognos 8
Web technologies: HTML, CSS, Macromedia Dreamweaver
Middleware: Cloudera, Web Services
Programming Language: Oracle, Sql, PL/ SQL, UNIX, Shell scripts
Industry Verticals: Retail, Banking, Manufacturing, Healthcare, telecom
ETL Tools: Informatica Power Center 9.1, 8.1,8.6. (Designer, Workflow Manager, Workflow Monitor, Repository manager and Informatica Server)
PROFESSIONAL EXPERIENCE
Confidential - Austin, TX
Sr. Big data/Hadoop developer
Environment: Hadoop, CDH 5.5.1, HDFS, MapReduce, Ruby, Hive, Pig, Oozie, HBase, Shell Scripting, Java, MySQL, DB2, Oracle.
Responsibilities:
- Worked on analysingHadoopstack and different bigdata analytic tools including Pig and Hive, HBase database and Sqoop.
- Worked ondatautilizing aHadoop, Zookeeper, and Accumulatestack, aiding in the development of specialized indexes for performant queries onbigdataimplementations
- Involved in Design and Development of technical specifications using Hadoop technology.
- Involved in moving all log files generated from various sources to HDFS for further processing.
- Written the ApachePIG scripts to process the HDFS data.
- Wrote Sqoop pipeline to efficiently Transfer data from MySQL, DB2, OracleExadata, Netezza to HadoopEnvironment.
- Moved all meterdatafromdatalakesgenerated from various meter routers to HDFS for further processing.
- Experience with data center transitioning from toAWScloud services.
- Worked with the production Environment onAWS, high availability practices and deploying backup/restore infrastructure.
- Performed the ETL (Extract Transform Load) process and wroteRubyscripts and loaded the data in the target database.
- Extracting data fromAWSRedshift, MongoDB, Hive and analysis and reporting.
- Work on Sugar CRM modules, writing crons, web services, APIs, Java, Python, PHP, MySQL, JQuery, Ajax, JSON etc.
- Developed Mapreduce programs inJavato convert from JSON to CSV and TSV formats and perform analytics.
- Used Pig as ETL (Informatica) tool to perform transformations, event joins and pre aggregations before storing the curated data into HDFS.
- Involved in migration of ETL (Informatica) processes from Relational Databases to Hive to test easy data manipulation.
- Performed CRUD operations using HBaseJavaClient API.
- Created Hive tables to store the processed results in a tabular format.
- Monitoring Hadoop scripts which take the input from HDFS and load the data into Hive.
- Created external tables in Hive.
Confidential, Bethesda, MD
Sr. Big data/Hadoop developer
Environment: Hadoop, HDFS, Apache Spark, Ruby, Pig, Hive, MapReduce, Sqoop, Oozie Nagios, Ganglia, LINUX, Hue.
Responsibilities:
- Worked on analyzing Hadoopcluster using differentbigdataanalytic tools including Pig, Hive, and MapReduce.
- Collecting and aggregating large amounts of logdatausing ApacheFlume and stagingdatain HDFS for further analysis.
- Created Hbase tables to store variousdataformats of PIIdatacoming from different portfolios.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance Pig queries.
- Developed full stack web applications inRubyon Rails.
- Involved in loadingdatafrom LINUX file system to HDFS.
- Importing and exportingdatainto HDFS and Hive using Sqoop.
- Experience working on processing unstructureddatausing Pig and Hive.
- Stored processed tables in Cassandra from HDFS for applications to access thedatain real time.
- Gained experience in managing and reviewingHadooplog files.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Assisted in monitoringHadoopcluster using tools like Nagios, andGanglia.
- Created and maintained Technical documentation for launchingHADOOPClusters and for executing Hive queries and Pig Scripts.
Confidential - Arlington, VA
Sr. Hadoop Developer
Environment: HDFS, Map Reduce, HIVE, Teradata, Spark, HBase, Sqoop, PIG
Responsibilities:
- Application installation of Hadoop, Hive, Spark&MapReduce&Sqoop
- HDFS support and maintenance and Adding/Removing a Node, Data Rebalancing.
- Developed MapReduce application using Hadoop, MapReduce programming and Hbase.
- Involved in developing the Pigscripts.
- Involved in developing the Hive and Spark Reports.
- Provisioning Dev/Stage/Production environment using Digital Ocean droplets &AWSCloud.
- ManagingAWSInfra via automation tools Terra form forAWSplatform setup/management.
- Extensively worked upon cutting edge technologies like BigData,AWScomputing.
- Designed high level ETL architecture for overall data transfer from the OLTP to OLAP.
- Unit tested and tuned SQLs and ETL Code for better performance.
- Monitored the performance and identified performance bottlenecks in ETL code.
- Driving POC initiatives for finding the feasibilities of different traditional and Bigdatareporting tools with thedatalake(Spotfire BO, Tableueetc)
- Extracted the data from Teradata into HDFS using the Sqoop.
- Exported the patterns analyzed back to Teradata using Sqoop.
Confidential, Fresno, CA
Hadoop Developer
Environment: HadoopCluster, HDFS, Hive, Pig, Sqoop, Linux,HadoopMap Reduce, HBase, Shell Scripting, MongoDB, and Cassandra.
Responsibilities:
- Installed/Configured/Maintained ApacheHadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning.
- Developed data pipeline using Flume, Sqoop, Pig and JavaMapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in collecting and aggregating large amounts of log data using ApacheFlume and staging data in HDFS for further analysis.
- Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Used Pig as ETL (Informatica) tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Write test cases, analyze and reporting test results to product teams.
- Responsible for developing data pipeline using AzureHDInsight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Installed Oozie workflow engine to run multiple Hive and PigJobs, used Sqoop to import and export data from HDFS to RDBMS and vice-versa for visualization and to generate reports.
- Good experience with Python Pig,Sqoop,Oozie,Hadoop Streaming and Hive
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Responsible for building scalable distributed data solutions using Hadoop, and responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
- Developed several new MapReduce programs to analyze and transform the data to uncover insights into the customer usage patterns.
- Worked extensively with importing metadata into Hive using Sqoop and migrated existing tables and applications to work on Hive.
- Load and transform large sets of structured, semi structured and unstructured data using Hadoop/BigData concepts.
- Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
Confidential - Sanjose, CA
Java Developer
Environment: Java1.6, J2EE, Agile, Spring 3, Hibernate, WebLogic 10.3, PL/SQL, Oracle 11g, Eclipse Kepler, JSF 2.2, Angular JS, Bootstrap, Richfaces v4, TDD, Web Services, JavaScript, log4j, Maven, Struts, JUnit, JMS and SVN.
Responsibilities:
- Actively involved in mock screens development and project start-up phase.
- Use Agile (SCRUM) methodologies for Software Development.
- Developed UI Layer for the application using AngularJS, HTML5, CSS3, and Bootstrap.
- Involved in building a single page and cross browser compatible web application using AngularJS (Angular routing) and bootstrap.
- Created Managed bean used to instantiate backing beans used in a JSF application and storing them in a Scope.
- Performed Use case design, object modeling using UML, like transformation of the Use Cases into Class Diagrams, Sequence Diagrams.
- Spring Framework at Business Tier and uses Spring's Bean Factory for initializing services.
- Developed the AJAX functionality using Ajax4 JSF tags libraries and implementing AJAX functionality for Asynchronous calls.
- Framework leverages JSF features like Event handling, Validation mechanisms, and state management.
- Developed Struts based presentation layer, Hibernate based DAO layer and integrated them using Spring Dependency injection, ORM and Web modules.
- Used Spring Core Annotations for Dependency Injection.
- Developed Backing beans to handle UI components state and stores that state in a scope.
- Involved in writing the database integration code using Hibernate.
- Extensively used JSF Core and HTML tags and FLEX for UI development. Used Spring Framework with Hibernate to map to Oracle database.
- Build the application using TDD (Test Driven Development) approach.
- Used Oracle as database and Involved in the development of PL/SQL backend implementation and using SQL created Select, Update and Delete statements.
Confidential, Dublin, OH
ETL/Teradata Developer
Environment: Informatica Power Center 9.1, UNIX,Teradata, Oracle 9i, MS SQL Server 2000,Cognos, TWS
Responsibilities:
- Did majority of theETLdesign, including Process specifications, Source to Target mappings, selection of the appropriateTeradataExtract/Load tools.
- Prepared technical specifications for the development of Informatica (ETL) mappings to load data into various target tables and definingETLstandards.
- Developed the mappings using needed Transformations in Informatica tool according to technical specifications.
- Extracted data from SQL server Source Systems and loaded into Oracle Target tables.
- Have implemented SCD's (Slowly Changing Dimensions) Type I and II for data load.
- Reusable transformations and Mapplets are built wherever redundancy is needed.
- The data from the 3NF is loaded to the Fact tables on a daily basis.
- Views are created on fact and dimension tables to be used in reporting.
- Analyzed workflow, session, event and error logs for trouble shooting InformaticaETLprocess.
- Unit TestingETLScripts in UNIX environment and document unit test results.
- Performance tunedETLBatch jobs, Reporting and Ad-hoc SQL.
- Carrying out various checks on loaded data for integrity related data issues.
- Written Fast Load and Multiload scripts to load the data from flat files to staging.
- Writing BTEQ Scripts to load data into Dimension tables with the hierarchy.
- Created stored procedures, macros, triggers, views, tables and other database objects.
- As per business requirement, validating the source and target data usingTeradata.
- Created the low level design document.
- Creation of UNIXCron jobs forETLbatch Scheduling.
Confidential
Data warehouse Developer
Environment: Informatica 5.1/6.2,TeradataV2R5, Oracle8i,TeradataSQL Assistant, FastLoad, MultiLoad,TeradataAdministrator, BTEQ, UNIX, DB2, COBOL.
Responsibilities:
- Designed the ETL Architecture for oracle data extraction and loading theTeradatabase tables.
- Extracted Data from different sources by using Informatica.
- Analyzed the source data and unloaded source data into flat files.
- Executed the data to run sessions and workflows.
- Designed and developed mappings in Informatica such as lookup transformations, Update strategy transformation, expression transformation, joiner etc.,
- Responsible to tuneETLprocedures and STAR schemas to optimize load and query Performance.
- Played a major role in selection ofTeradataload utilities based on the nature of data.
- Responsible for coding and executing SQL scripts for Oracle data extraction.
- Solely Responsible for Source data analysis and formulation of extract strategy.
- Contributed in preparation ofETLspecifications for partial subject areas by continual interaction with data modeler.
- Suggested data type and other table modifications to physical and logical model.
- Created of staging tables and Test base tables for enhancements to the physical model.
- Created Bteq scripts with data transformations for loading the base tables.
- Standardized theETLscripts catering to error table handling process and load statistics collection as per the organizational IT standards.
- Documentation of scripts, specifications and other processes.
- Responsible for testing and fixing all issues arising from data validation process.
- Loading staging tables onTeradataand further loading target tables onTeradatavia views.
- Creating load process to perform aggregations and load base tables.
