Sr. Big Data / Hadoop Consultant Resume
Chevy Chase, MD
PROFESSIONAL SUMMARY
- 14+ years of development experience in the Design, Developing and maintaining Client - Server Applications, Integration, Web based applications and Windows based applications as a Software Developer and Internet Application Developer.
- 3+ years of experience in BigData development, Big Data Analytics and Hadoop.
- 8+ years of experience in JAVA J2EEdevelopment, JSP, Servlets, JDBC, MVC, Web Services.
- Hands on experience in installing, configuring and using ecosystem components like Hadoop MapReduce, HDFS, PIG, HIVE, HBASE, ZooKeeper, Oozie, Cassandra, SQOOP, Flume, Spark, Kafka.
- Expertise in NoSQL Databases like HBase, Cassandra and MongoDB, also plugging them to Hadoop eco system.
- Experienced in installing, configuring, and administrating Hadoop cluster of major Hadoop distributions like Cloudera CDH3, CDH4, IBM Big Insights.
- Extensive experiences in Hadoop Map Reduce as Programmer Analyst in business requirement gathering, analysis, scoping, documentation, designing, developing and creating Test Cases.
- Experience in Writing Map Reduce jobs using Java native code, Pig, Hive, Python for various business use cases.
- Experience in Administering, Installation, configuration, troubleshooting, Security, Backup, Performance Monitoring and Fine-tuning of Linux (Ubuntu / RedHat).
- Excellent JAVA, J2EE application development skills with strong experience in Object Oriented Analysis, Extensively involved throughout Software Development Life Cycle (SDLC)
- Strong experience of J2EE, XML, Web Services, WSDL, SOAP, UDDI, TCP, IP.
- Extensive experience in database design and hands-on experience of large database systems: Oracle 9i, DB2, MySQL, PostGreSQL and SQL Server.
- Experience in working with different data sources like Flat files, XML files and JSON files.
- Knowledge on Cloud based solutions like Amazon Web services (AWS -EC2), IBM soft layer, Rackspace and Right Scale.
- Exposure to Hadoop 2.0, YARN using Resource Manager, Node manager and App Master.
- Exposure to Maven/Ant, Git along with Shell Scripting for Build and Deployment Process.
- Knowledge about Mahout algorithm(machine learning algorithms)
- Having good noledge in writing scripts using shell, Python & Perl in Linux.
- Strong experience in Data Structures - Map, Set, List, Binary Tree and Algorithms - Sort, Search, Merge etc.
- Experience with working in an Agile/SCRUM model
- Experience in writing database objects like Stored Procedures, Functions, Triggers, PL/SQL packages and Cursors for Oracle, SQL Server, MySQL & Sybase databases.
- Good Experience in Windows SharePoint Services, MOSS 2007, SharePoint Designer.
- Strong experience in software design and development with key skills in ASP.NET, ADO.NET, C#.Net, VB.Net, Web services, SQL server, stored procedures, XML,HTML and good proficiency in Web based and Windows based applications.
- Excellent domain noledge in Life Insurance, Property & Casualty Insurance, Finance and Pharma.
- Roles played: Team Leader, Business Requirements Analyst, System Requirements Analyst
- Experience in all phase of Software Development Life Cycle. Experience in Waterfall and Agile.
- Excellent communication, presentation, technical writing & team work skills.
TECHNICAL SKILLS
Big Data Eco System: Hadoop, MapReduce, HDFS, HBase, Hive, Pig, Sqoop, Flume, Jaql
NOSQL databases: HBase, Cassandra, MongoDB
Cloudera Distribution: Cloudera CDH3, CDH4, Hue Beeswax, Cloudera Manager, Impala
IBM Distribution: IBM Big Insights, Big SQL, Watson Explorer, SPSS
Cloud Solutions: Amazon Web services (AWS EC2), Elastic Map Reduce, Amazon S3, Amazon Cloud Watch, Amazon Red Shift, IBM Smart Cloud, Rackspace and RightScale, Amazon RDS
Big Data Analytics-ETL: R, Mahout, SAS, Netezza, Actuate, Datameer, Pentaho, Data stage, Tableau, Talend, Ab Initio
Change Data Capture: Oracle Golden Gate, IBM CDC tool
Operating System: Windows, UNIX and LINUX (RedHat/Ubuntu)
Relational Databases: SQL server (SSIS, SSRS), Oracle, DB2, MySQL, PostGreSQL
Web Technologies: HTML, DHTML, XML, ASP, and ASP.NET 3.5, ADO.NET, Web Services
Languages: C#, JAVA, C++, C, COBOL, Java Script, SQL, Python
Web Server: Microsoft IIS, Apache TOMCAT.
Version Control: Microsoft Visual Source Safe, TFS, Vault, GIT, Perforce
Internet Technologies: XML, XSLT, XPath, CSS, WCF, WWF, WPF, XAML
Font End Tools: Visual Studio, Eclipse, RStudio
Integration Tools: BizTalk Server, MOSS 2007, InfoPath
Mainframes: DB2 8.0, COBOL II, CICS, FILE-AID, JCL, PVCS, Change man, IDMS
PROFESSIONAL EXPERIENCE
Confidential - Chevy Chase, MD
Sr. Big Data / Hadoop Consultant
Responsibilities
- Worked on Map Reduce as Programmer Analyst in business requirement gathering, analysis, scoping, documentation, designing, developing and creating Test Cases.
- Importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa
- Used Change Data Capture tools like Golden Gate, IBM CDC to integrate with HIVE for incremental load.
- Extensively used shell scripts and Flume for data loading from Kafka messages.
- Used ETL Tools like Pentaho, Data Stage, Tableau, Talend and Ab Initio for data transformations.
- Worked on Watson Explorer for creating indexes on HDFS/Hive data for faster retrieval of data.Also worked on RC files and Parquet.
- Worked on IBM Big Insights, Big SQL. Wrote Apache PIG scripts to process the HDFS data. Also wrote PIG UDFs in JAVA and used them in PIG.
- Created HBASE Query Parser utility built on Java for ingesting data into HBase with RowIds and column families.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Worked on extending Hadoop framework using UDF/UDAF/UDTFs in HIVE.
- Developing applications with NoSQL databases like HBASE, Cassandra, MongoDB
- Automation script to monitor HDFS and HBase through cronjobs.
- Expertise in Creating Hive Internal/External Tables/Views using shared Meta store, writing scripts in HiveQL also data transformation & file processing using Pig Latin Scripts.
- Unit testing/debugging Map Reduce jobs with MRUnit on Mapper, Reducer, Practitioner and combiners.
- Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows.
- Setting up monitoring infrastructure for Hadoop Cluster using Ganglia and Splunk.
- Data analysis using SPSS modeler tool and Datameer.(customer 360 degree view)
- Developed SQL queries to interact with the Oracle database and used JDBC to interact with the database. UsedClear Questas bug tracking system. Extracted Logging errors byLog4j.
- Worked on Perforce to perform the version control.
- Used JIRA for bug tracking, issue tracking and project management.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Iterative and incremental development with agile software development process using Sprints (SCRUM).
- Participated in sprint planning sessions to determine the scope of the sprint, and interacted with the product team on a continuous basis. (Agile)
Environment IBM Big Insights, Big SQL, HDFS, Hive, Pig, HBase, Sqoop, Kafka, Flume, Oozie, ZooKeeper, Ganglia,, Linux Shell scripting, Pentaho, CDC tools, Eclipse, MRUnit, Java, JIRA.
Confidential - Ashburn, VA
Sr. Big Data / Hadoop Consultant
Responsibilities
- Installing, configuring and using ecosystem components like Hadoop MapReduce, HDFS, HBase, ZooKeeper, Oozie, Hive, Cassandra, Sqoop, Pig, and Flume.
- Load and transform large sets of structured, semi structured and unstructured data.
- Build and Install Cloudera Distribution of Hadoop eco system on 26 nodes with total 624 cores, 150 TB of Data Storage, 416 mappers, 208 reducers, 2 RACKs, 1.25 TB RAM etc.
- Worked on Map Reduce as Programmer Analyst in business requirement gathering, analysis, scoping, documentation, designing, developing and creating Test Cases.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Involved in Hadoop Cluster environment administration that includes cluster capacity planning, cluster Monitoring, Troubleshooting, performance tuning.
- Decommissioning and commissioning the Nodes - adding and removing cluster nodes.
- Recovering from a NameNode failure - Back Up and Restore from Secondary Name node
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml and hadoop-env.xml based upon the job requirement.
- Involved in performing upgrades, manage configuration changes, maintain System integrity and monitor Cluster performance in a multi-tenancy environment.
- Moved all crawl data flat files generated from various retailers to HDFS for further processing.
- Created HDFS users and assigned quotas on HDFS file system.
- Wrote Apache PIG scripts to process the HDFS data. Also wrote PIG UDFs in JAVA and used them in PIG.
- Importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa
- Configuring the Zookeeper to coordinate the servers in Clusters and to maintain the Data consistency.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Experience in Implementing Rack Topology scripts to the Hadoop Cluster
- Backup of Hive metadata.
- Used Pig as ETL tool to do transformations, event joins, filters both traffic and some pre-aggregations before storing the data onto HDFS.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Monitored logs and system health check and Ensured data integrity using fsck for block corruption.
- Loaded web server logs into HIVE tables and analyzed for customer interest on products and product support.
- Worked on extending Hadoop framework using UDF/UDAF/UDTFs in HIVE.
- Created external table and partitioned tables in Hive for querying purpose
- Created Hive tables to store the processed results in a tabular format.
- Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows.
- Expertise in setting, configuring & monitoring of Hadoop cluster using Cloudera CDH3, CDH4 on Ubuntu.
- Upgrading the Hadoop Cluster from Cloudera CDH3 to CDH4 and setup High availability Cluster Integrate the HIVE with existing applications
- Extensively used SOAPUI to debug webservices on the ECLIPSE IDE.
- Worked with GitHub to perform the version control.
- Developing applications with NoSQL databases like HBASE, Cassandra, MongoDB
- Experienced in writing MapReduce programs & Eval UDFs for both Hive & Pig in Java.
- Developed Java classes that provide JDBC connectivity to the application with Oracle database
- Designed, developed and deployed application using Eclipse and Tomcat application Server.
- Automation script to monitor HDFS and HBase through cronjobs.
- Expertise in Creating Hive Internal/External Tables/Views using shared Meta store, writing scripts in HiveQL also data transformation & file processing using Pig Latin Scripts.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Setting up monitoring infrastructure for Hadoop Cluster using Ganglia and Splunk.
- Regression (linear, multivariate) analysis using R language and plotting graphs of regression results using Shiny R framework.
- Worked on various Data Structures - Map, Set, List, Binary Tree and Algorithms - Sort, Search, and Merge etc.
Environment Hadoop, HDFS, Hive, Pig, HBase, Sqoop, MongoDB, Flume, Oozie, ZooKeeper, Ganglia, Cloudera CDH3, CDH4, Hue, Linux, Eclipse, MRUnit, Java.
Confidential, Middletown, NJ
Big Data / Hadoop Consultant
Responsibilities
- Worked on Administering, Installation, configuration, troubleshooting, Security, Backup, Performance Monitoring and Fine-tuning of Linux Redhat.
- Plan, design, and implement processing massive amounts of marketing information, complete with information enrichment, text analytics, and natural language processing.
- Worked on Data load management, importing & exporting data using SQOOP & FLUME.
- Worked on NoSQL Databases like HBase, Cassandra and MongoDB, also plugging them to hadoop Eco system.
- Workflow scheduling and monitoring tools like Oozie and enabling High Availability of HBase & Hadoop cluster using Zookeeper.
- Define Oozie work flows to kick off Hive and Sqoop jobs in orderly fashion
- Automation script to monitor HDFS and HBase through cronjobs.
- Provided faster search responses using indexing with the halp of Solr and Apache Lucene.
- Implemented Partitioning, Dynamic Partitions, Buckets and Clusters in HIVE.
- Specifying the Cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
- Worked on extending Hadoop framework using UDF/UDAF/UDTFs in HIVE.
- Worked on Amazon Web services AWS (Amazon EC2) and Right Scale as cloud based solution for Hadoop clusters.
- Worked on Splunk for log analysis and reporting
- Monitored Hadoop cluster environment using Ganglia
- Worked on Hadoop Architecture and the components of Hadoop - MapReduce, HDFS, Job Tracker, Task Tracker, NameNode and Data Node
- Designed and implemented Map Reduce jobs to support distributed data processing.
- Data processing using PIG by applying LOAD, Filter, Group, JOIN, sort, Combining and Splitting
- Writing, testing, and running Map Reduce pipelines using Apache Crunch.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Involved in installing and configuring Kerberos for the authentication of users and Hadoop daemons
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Iterative and incremental development with agile software development process using Sprints (SCRUM).
- Participated in sprint planning sessions to determine the scope of the sprint, and interacted with the product team on a continuous basis. (Agile)
- Involved in various phases of Software Development such as modeling, system analysis and design, code generation and testing using AGILE Methodology
- Participated in daily Stand up meetings with Scrum Master.
- Designed, developed and deployed application using Eclipse and Tomcat application Server.
- Developed SQL queries to interact with the Oracle database and used JDBC to interact with the database. UsedClear questas bug tracking system. Extracted Logging errors byLog4j.
- Written Test Cases for Unit Level Testing usingJUnit.
- Extensive usage ofANTbuilds process for the delivery of the end product.
Environment Hadoop, HDFS, Hive, Pig, HBase, Sqoop, MongoDB, Cassandra, Flume, Oozie, ZooKeeper, Ganglia, AWS (Amazon EC2), Elastic Map Reduce, Amazon S3, Amazon Cloud Watch, Right Scale, Linux, Eclipse, MRUnit, Java.
Confidential, Harrisburg, PA
Sr Enterprise Application Developer
Responsibilities
- Database designing using SQL Server 2008.
- Worked on Flat Files generation to process them through Policy Management System (PMS) of various formats from disparate applications using BizTalk 2009 schemas
- Writing stored procedures in SQL 2008 and calling them using SQL adapter.
- Created Update datagram to call stored procedures from the SQL Port
- Used SOAPUI to validate the Web Services based on WSDL.
- Installing BizTalk Server, troubleshooting and maintaining the environment.
- Deploying the BizTalk Server Applications on remote servers.
- Unit testing while development and regression testing after development in integration environment.
- Used Visual Source Safe (VSS) and Team Foundation Server (TFS) to track the versions of the Code.
- Daily migrations to Integration, System test and Model offices and monitoring the batch runs and creating reports using SharePoint Adapter.
- Migration and Up gradation of BizTalk server 2006 to 2010 for Renewal Applications(migration to VS2010 and .Net Framework 4.0)
- Production and UAT support for nightly cycles.
- Used XMLSpy to evaluate the XPath expressions and to debug the XSLT Transformation.
- Used Image Right web services to generate the images using Exstream call
- Environment set up for Testing in QA and prodSupp
Environment Windows 2003 and 2008 Server and Windows XP, .Net Framework 2.0 and 3.5, BizTalk Server 2009, XML, Flat File, Pega Rules Engine, Exstream, ImageRight, SOAPUI, Orchestrations, Pipelines, HAT, C#.Net, Beyond Compare, Ultra Edit, XMLSpy, XPath, VSS and MS SQL Server 2005 and 2008.
Confidential, Bloomington, IL
Enterprise Application Developer
Responsibilities
- BizTalk: Designing schemas, creating maps, designing orchestrations, developing policies in the rules engine, Testing and debugging thru HAT
- Architect, design, and develop the BizTalk infrastructure and data processing environment.
- Used WCF services & External Assemblies into Orchestrations using Biztalk
- Calling web services in orchestration and exposing orchestration as web service using Biztalk Web services publishing wizard.
- Implemented real time solutions using RFID
- Developing Custom pipelines and Custom functoids.
- Creating EDI HIPAA 835, 837, HL7 schemas using EDIFACT, EDI X12
- Exception Handling using catch blocks in Long run transactions
- Involved in creating tables, stored procedures in SQL Server 2005, DB2 for data manipulation and retrieval.
- Used Dataset, Data Table, Data Adapter, Data Reader and other ADO.NET connectivity controls extensively.
- Built Web services and consumed using web services
- Worked on XML, XPath, XSLT, WCF, WPF(XAML), WWF, LINQ
- Designed DTS/SSIS Packages to transfer data between servers, load data into database, archive data file from different DBMSs using SQL Enterprise Manager/SSMS on SQL Server 2000/2005 environment
- Created Reports using SQL server reporting services (SSRS)
- Made use of the Master Pages, Navigation Controls, Profile and Cross Page Postback, the new features of ASP.NET 2.0.
- Worked in New business submission process and content management for Home Insurance.
- Participated in mid-term code reviews and final-code reviews.
- Responsible for fixing and trouble-shooting various issues in production
- Responsible for the debugging, testing and implementation of different modules.
- Used Team Foundation Server for version control
- SharePoint: Integrating with the InfoPath, SharePoint Designer and Visual Studio to create the forms, develop workflows and for handling events for the lists in the SharePoint site respectively.
- Creating the SharePoint site, Site collection, Lists, Document libraries Form libraries and views in MOSS.
- Customizing the Site using SharePoint Designer.
Environment Windows 2003 Server, C#, VB.NET, ASP.NET, ADO.NET, IIS, SQL Server, DB2, TFS, XML, XSLT, XPath, Biztalk 2006 R2, EDI, InfoPath, RFID, WCF, WWF, WPF, XAML, LINQ, SSIS, SSRS, MOSS2007, WSS 3,0, SharePoint Designer
Confidential, Roseland, NJ
Enterprise Application Developer
Responsibilities
- Developing and customizing the Send/Receive Pipelines
- Developed a custom encryption component for custom send pipeline
- Involved in developing the application using C# from ASP.NET.
- Designed and created tables, views, and stored procedures in SQL Server
- Managing access rights based on user’s hierarchy and roles and user rights.
- Used ADO.NET in connecting to Data Access management with SQL Server 2000
- Performed Various Validations as per Various Customer Eligibility Criteria
- Writing presentation layer and User controls using C#, Business Logic and Data Access components using C# classes
- Generated resource files and modified to embedding the resources as well as Localization purposes using Resource
- Writing presentation layer and User controls using C#, Business Logic and Data Access components using C# classes
- Designed and developed transactional BizTalk Orchestrations.
- Manage daily BizTalk feed processing and error logging.
- Installing, configuring and testing BizTalk server functionality including orchestrations.
- Used XML Web services.
- Bug Fixing and Enhancements based on the Change Request
- Participated in mid-term code reviews and final-code reviews.
- Implemented version controlling using Visual Source Safe.
- Worked with QA & Configuration teams for testing and for each release in diff environment.
- Prepare test strategies, carried out Unit Testing, Functionality Testing, Stress Testing
- Developed front end user interface using JAVA AWT, Swings, XML.
- Worked on Client side validations using Java script
- Developed operational plan for ETL, Data Loading and data cleaning process and wrote scripts for automation using shell scripting.
- Automate build process by writing ANT build scripts.
- Configured and customized logs using Log4J.
- Involved in installing and configuring Eclipse and Maven for development.
- Used Log4J to validate functionalities and JUnit for unit testing.
- Developed Java classes that provide JDBC connectivity to the application with Oracle database
- Designed, developed and deployed application using Eclipse and Tomcat application Server.
- Classes are designed by using Object oriented Design(OOD) concepts like encapsulation, inheritance etc
- Involved in unit integration, bug fixing, acceptance testing with test cases, code reviews.
- Developed SQL queries to interact with the Oracle database and used JDBC to interact with the database. Used Clear quest as bug tracking system. Extracted Logging errors by Log4j.
- Written Test Cases for Unit Level Testing using JUnit.
- Extensive usage of ANT builds process for the delivery of the end product.
- Applied Operating system updates, patches and configuration changes
- Adding, removing and updating user account information, resetting passwords..etc
- Running Cron jobs to back up of data
- Monitored System Metrics and logs for any issues
Environment Java Servlets, JDBC, PL/SQL, XML, Log4j, JUnit, SVN, ANT, C#, ASP.NET, VB.NET, ADO.NET, XML, SQL Server, PVCS, IIS, Test Director, Quality Center, Quick Test Professional, Linux, Oracle.
Confidential
Mainframes Developer
Responsibilities
- Gathering the business process on the existing system
- Migrating the source from ADS/O to CICS
- Preparation of design Documents
- Coding in COBOL,CICS with DB2
- Implemented DB2 cursors and stored procedures
- Prepare Test plan
- Unit Testing
- Testing the converted source ( 3270 Testing)
- Stub Testing
- UAT Support
Environment DB2, COBOL, CICS, IDMS, ADSO, Changeman, MVS - OS/390, FILE-AID, JCL, Test Director