Sr. Hadoop-spark Consultant Resume
SUMMARY
- 14+ years of technical experience in Development and Design including 5 years in Big Data (Hadoop, Spark) Technologies.
- Excellent work experience in Peta Byte data environment. Daily inflow of 20 - 30 TB in the system. In data analysis
- Excellent work experience on Data migration, Data Preprocessing, validations, Data Analysis and dashboard Creation.
- Developed Tool to migrate data from Oracle, Sql Server, and DB2 to HBASE.
- Worked on Data Analysis using Hive, Pig, spark, spark-sql, Dataframe, Spark-Mlib.
- Worked on Data Quality Matrix, Data Lake implementation and data validation in HDFS.
- Extensive experience with full SDLC with proficiency in mapping Business Requirements, Application Design, Development, Integration, Testing, Technical Documentation, and Troubleshooting for Mission critical applications.
- Extensive Experience in Agile scrum and Waterfall methodology.
- Experienced in Quality Management techniques using SEI CMM based processes during all phases of project life cycle.
- Excellent communication, leadership and interpersonal skills.
- Extensively worked as a part of onsite - offshore delivery model as a lead and coordinator.
TECHNICAL SKILLS
Domain: Telecom, Financial Market, Energy and Utility and Retail
Methodologies: Agile (Scrum), Waterfall
Big data: Cloudera, Hortonworks, Hadoop
Hadoop: HDFS, Pig Latin, Hive, OOZIE, Tez, Spark, Spark-SQL, Dataframe
Data Migration: SQOOP, Flume, Spark Streaming, Kafka advance Tool &Tech: Hue, Splice, Paxata, Wherescape, Informatica, Tableau, Talend
Database: NoSQL (HBASE, Cassandra), DB2, Sql Server, Teradata, Oracle
Data Modeling: Data Studio, Sql Developer, Aginity, DBeaver
Languages: Python, Scala, SQL, Java
Administrative: Cloudera Manager, Ambari
Work Tracking: JIRA, TFS, Agile Scrum, Rational Project Management (RPM), Rational Team concert (RTC)
Incident Management: REMEDY, PERIGRINE, HPQC, HPALM
Job Schedulers: OOZIE, Active Batch, Auto Sys, Control M
Configuration: Ant, Maven, GIT
IDE: Intellij Idea, Eclipse
Data Exchange: XML, JSON, AVRO, Parquet, ORC
Operating System: Linux, Unix, Windows 10/8/7/XP/2000, MVS-OS/390, z/OS
Other Tools/ Technologies: Zookeeper, Splice, CICS, PL/1, COBOL, JCL, Easytrieve, REXX, VSAM, ENDEVOR, Librarian, SCLM, Changeman, FILE-AID, IBM File Manager, Rapid SQL, DB2 Command Line, Platinum, CONTROL-M, Lotus Notes, Outlook, MS-Office, VMWare, Virtual box
PROFESSIONAL EXPERIENCE
Confidential
Sr. Hadoop-Spark Consultant
Technology: Python, Spark, Spark-Sql, Spark-Dataframe, Hive, Hbase, Shell Script, Ambari, Scala, Kafka, Auto Sys, Talend, MQ
Responsibilities:
- Include coding on Data Analytics models using PY-Spark, spark-streaming, hive, pig and Hbase.
- Include various analytical model creations and implementations.
- Include ETL (development using python, spark, SQOOP framework), Data Validation and visualization.
- Worked on DAG creation using Talend to get the data in Data Lake from oracle sources.
- Using Spark Data-frames for faster retrieval of data joins on tables of No-SQL( hbase ) database.
- Involved in multi-tier multimodal implementation(Auto Subrogation) project interacts with Hadoop / SAS/ Claim Center (complete J2EE based application) using MQ
- Includes performance optimization for read/ Writes.
- Responsible for working on Ambari/AutoSys for monitoring and few Administration activities.
- Responsible for mentoring and leading practitioners.
Confidential, Bellevue, WA
Sr. Hadoop-Spark Consultant
Technology: Spark, Spark-Sql, Spark-Dataframe, MLib Hive, Hbase, Shell Script, Ambari, Scala, Kafka, Active Batch, Kerberos, Yarn
Responsibilities:
- Worked on various Analytical POC’s to Identify Auto Intenders, Identifying Customers shopping with other providers. Likely churner Analysis, Marketwise Facebook usage, External user Network, Influencer Analytics, Inferred Moving, Inferred Network Auto Intenders, Inferred Age, Inferred Gender base on First name etc.
- Worked on ETL, Data Validation and visualization, using Kafka, spark Streaming, Scala, Hive and Tableau.
- Using Spark Data-frames for faster retrieval of data joins on tables of No-SQL database.
- Coding classes using Spark / Scala for data Analytics on various modules.
- Querying to Tables with more than 40-100TB of data size
- Working on Ambari for monitoring and few Administration activities.
- Worked as lead responsible for mentoring and leading practitioners.
Confidential
Sr. Hadoop Developer/Lead
Technology: Splice, Paxata, WhereScape, SQOOP, OOZie, HDFS, Eclipse, HBASE, Shell Script, Hue, Hive, Spark, Scala, Active Batch
Responsibilities:
- Moved data from more than 75 databases having more than 2000 tables to HBASE using Splice.
- Worked on Data Modelling on HBASE to normalize and faster retrieval of the data.
- Used Spark Data frames for faster retrieval of data joins on tables of No-SQL database.
- Coded classes using Spark / Scala for standard Validation checks using configuration files.
- Worked on Cloudera Manager Administration to add/subtract node, user administration, issue resolutions, Add new services, versions and Cloudera upgrades.
- Working on EDH to bring data to the central hub with intent of faster, Reliable data transfer and validation to HDFS for Data Analysis.
- Worked on tool to bring multiple Databases/ multiple tables in the EDH in one full run.
- Developed work flows to schedule various Hadoop programs using Active Batch.
- Worked as lead responsible for work assignment and leading 4 practitioners in offshore.
Confidential, Thousand Oaks, CA
Sr. Hadoop Developer
Technology: Hive, Spark, Scala, Pig Latin, Map Reduce, SQOOP, OOZie, HDFS, Eclipse, Cassandra, Shell Script, Hue
Responsibilities:
- Worked on DQM POC and Implementation with intent of faster, Reliable data transfer and validation to HDFS for Data Analysis.
- Coded classes using Spark / Scala for Validation checks (Standard, File and Business checks) using Manifest files.
- Processing of data received in various different formats AVRO, Parquet, Bz2, zip, Sequence, and Text Files.
- Responsible for creating Data Partitions and Buckets using Hive.
- Developed work flows to schedule various Hadoop programs using Oozie.
- Worked as lead responsible for assigning and leading 5 practitioners in offshore.
Confidential
Hadoop Lead
Technology: Hive, Pig Latin, Map Reduce, Sqoop, Spark, OOZie, Cassandra, HDFS, Eclipse, Hue
Responsibilities:
- Worked on Customer Door Ship POC with an intent to reach out to customer before demand, Faster/ Same Day Delivery, adding value to customer and ultimate goal to increase revenues.
- Responsible for logical and physical design of Cassandra tables.
- Worked on node tool, CQL and Dev ops.
- Done Data Modelling for Cassandra tables.
- Responsible to manage data coming from different sources.
- Importing and exporting data into HDFS using SQOOP and Flume.
- Created Data Partitions, Buckets for better performance on Hive queries.
- Responsible for writing Merging algorithms for incremental data updates.
- Responsible for writing PIG (Pig Latin) scripts for ad-hoc data retrieval.
- Developed work flows to schedule various Hadoop programs using Oozie.
- Involved in analysis, design, testing phases and responsible for documenting technical specifications
- Developed and supported existing map reduce programs and jobs for various data cleansing features like Schema validation, Filtering, Joins and Row Count of data.
- Worked as lead responsible for assigning and leading 4 practitioners
Confidential
Sr. DB2 Developer / Lead
Tools: IBM Data Studio, Db2 Control Center, ERWIN, Design Advisor, UNIX shell scripting.
Responsibilities:
- Supporting Developing teams with Database management (Logical and physical Designing, Performance Tuning.
- Working on DB2 LUW tools to monitor and manage DB2 database on Windows and AIX.
- Working on Query Performance tuning.
- Design data model. (Erwin)
- Create new database objects including tables, views, triggers and indexes.(Erwin, DB2 Data Studio, DDL scripts, File aid for Db2(Zos), Platinum)
- Writing and tuning large SQL queries using Teradata.
- Populate dimension tables with data from various sources including text files. Write shell or Perl scripts when necessary.(Unix shell, Perl, Load utility, Import, Export, Control Centre, Load-unload utility in z/OS)
- Develop stored procedures and Trigger, while creating more indexes in the process for performance improvement and automation. (DB2 Data Studio, Design Advisor)
- Migrate to test and production database, unit test and tune the application.
- Document complete database design, High level data Design and technical data flow related design
- Support system integration testing (SIT), user acceptance testing (UAT).
- Working as a senior developer for CAS GAS Migration Project.
- Coordinating with clients and off-shore team for anchoring global delivery model.