We provide IT Staff Augmentation Services!

Sr. Hadoop-spark Consultant Resume

2.00/5 (Submit Your Rating)

PROFESSIONAL SUMMARY:

  • 14+ years of technical experience in Development and Design including 5+ years in Big Data (Hadoop, Spark) Technologies.
  • Excellent work experience in Peta Byte data environment. Daily inflow of 20 - 30 TB in the system. In data analysis
  • Excellent work experience on Data migration, Data Preprocessing, validations, Data Analysis and dashboard Creation.
  • Developed Tool to migrate data from Oracle, Sql Server, and DB2 to HBASE.
  • Worked on Data Analysis using Hive, Pig, spark, spark-sql, Dataframe, Spark-Mlib.
  • Worked on Data Quality Matrix, Data Lake implementation and data validation in HDFS.
  • Extensive experience with full SDLC with proficiency in mapping Business Requirements, Application Design, Development, Integration, Testing, Technical Documentation, and Troubleshooting for Mission critical applications.
  • Extensive Experience in Agile scrum and Waterfall methodology.
  • Experienced in Quality Management techniques using SEI CMM based processes during all phases of project life cycle.
  • Excellent communication, leadership and interpersonal skills.
  • Extensively worked as a part of onsite - offshore delivery model as a lead and coordinator.
  • Excellent end to end Modeling experience using various Hadoop Technologies.
  • Excellent Coding Experience in Spark using python and Scala.
  • Coding Experience in Spark SQL, Data Frames and Spark Streaming and Mlib.
  • Experience in Data Retrieval, Pre-processing, and Joins optimization using spark.
  • Experience in Data Modeling in No-SQL Databases (Cassandra, HBASE)
  • Experience in ETL using SQOOP, FLUME, Kafka, Spark Stream, Hive and HDFS
  • Extensive Experience in Hadoop architecture.
  • Experience in Data Analysis using Spark SQL, PAXATA, Hive and Impala.
  • Worked on SQOOP / Hive / python / Scala to ingest the data in HBASE using HDFS from various different sources Sql-Server, Oracle, Sales Force, Flat Files, Mainframe Flat files.
  • Worked on Cloudera Manager/ Ambari as admin to identify cluster issues, manage cluster, upgrades, Users, Monitoring, and worked closely with Cloudera team for cluster issue resolution.
  • Production ETL Experience with Oracle, MySql, Db2 and legacy systems.
  • Experience in HiveQL, Pig Latin in creating reports, writing scripts for business use cases.
  • Extensive DB2 UDB, Teradata experience with Logical and Physical design (stored procedures, indexes, table, Alias, Synonym and Views), Managing DATA, and Performance Tuning.
  • Leading various project aspects like System design & integration, coding of modules, Peer Reviews, monitoring critical paths & taking appropriate actions at the right time.
  • Designing / gap analysis to ensure that business requirements and functional specifications fulfilled.

TECHNICAL SKILLS:

Domain: Telecom, Financial Market, Energy and Utility and Retail

Methodologies: Agile: (Scrum), Waterfall

Big data: Cloudera, Hortonworks, Hadoop

Hadoop: HDFS, Pig Latin, Hive, OOZIE, Tez, Spark, Spark-SQL, Dataframe

Data Migration: SQOOP, Flume, Spark Streaming, Kafka advance Tool &Tech Hue, Splice, Paxata, Wherescape, Informatica, Tableau, Talend

Database: NoSQL (HBASE, Cassandra), DB2, Sql Server, Teradata, Oracle

Data Modeling: Data Studio, Sql Developer, Aginity, DBeaver

Languages: Python, Scala, SQL, Java

Administrative: Cloudera Manager, Ambari

Work Tracking: JIRA, TFS, Agile Scrum, Rational Project Management (RPM), Rational Team concert (RTC)

Incident Management: REMEDY, PERIGRINE, HPQC, HPALM

Job Schedulers: OOZIE, Active Batch, Auto Sys, Control M

Configuration: Ant, Maven, GIT

IDE: Intellij Idea, Eclipse

Data Exchange: XML, JSON, AVRO, Parquet, ORC

Operating System: Linux, Unix, Windows 10/8/7/XP/2000, MVS-OS/390, z/OS

Other Tools/ Technologies: Zookeeper, Splice, CICS, PL/1, COBOL, JCL, Easytrieve, REXX, VSAM, ENDEVOR, Librarian, SCLM, Changeman, FILE-AID, IBM File Manager, Rapid SQL, DB2 Command Line, Platinum, CONTROL-M, Lotus Notes, Outlook, MS-Office, VMWare, Virtual box

WORK EXPERIENCE:

Confidential

Sr. Hadoop-Spark Consultant

Methodology: Agile (Scrum)

Technology: Python, Spark, Spark-Sql, Spark-Dataframe, Hive, Hbase, Shell Script, Ambari, Scala, Kafka, Auto Sys, Talend, MQ

Database: HBASE, Derby

Responsibilities:

  • Include coding on Data Analytics models using PY-Spark, spark-streaming, hive, pig and Hbase.
  • Include various analytical model creations and implementations.
  • Include ETL (development using python, spark, SQOOP framework), Data Validation and visualization.
  • Worked on DAG creation using Talend to get the data in Data Lake from oracle sources.
  • Using Spark Data-frames for faster retrieval of data joins on tables of No-SQL( hbase ) database.
  • Involved in multi-tier multimodal implementation(Auto Subrogation) project interacts with Hadoop / SAS/ Claim Center (complete J2EE based application) using MQ
  • Includes performance optimization for read/ Writes.
  • Responsible for working on Ambari/AutoSys for monitoring and few Administration activities.
  • Responsible for mentoring and leading practitioners.

Confidential, Bellevue, WA

Sr. Hadoop-Spark Consultant

Methodology: Agile (Scrum)

Technology: Spark, Spark-Sql, Spark-Dataframe, MLib Hive, Hbase, Shell Script, Ambari, Scala, Kafka, Active Batch, Kerberos, Yarn

Database: HBASE, Derby

Responsibilities:

  • Involved in Data Analytics.
  • Worked on various Analytical POC’s to Identify Auto Intenders, Identifying Customers shopping with other providers. Likely churner Analysis, Marketwise Facebook usage, External user Network, Influencer Analytics, Inferred Moving, Inferred Network Auto Intenders, Inferred Age, Inferred Gender base on First name etc.
  • Worked on ETL, Data Validation and visualization, using Kafka, spark Streaming, Scala, Hive and Tableau.
  • Using Spark Data-frames for faster retrieval of data joins on tables of No-SQL database.
  • Coding classes using Spark / Scala for data Analytics on various modules.
  • Querying to Tables with more than 40-100TB of data size
  • Working on Ambari for monitoring and few Administration activities.
  • Worked as lead responsible for mentoring and leading practitioners.

Confidential, El Segundo, CA

Sr. Hadoop Developer/Lead

Methodology: Agile (Scrum)

Technology: Splice, Paxata, WhereScape, SQOOP, OOZie, HDFS, Eclipse, HBASE, Shell Script, Hue, Hive, Spark, Scala, Active Batch

DBMS: Db2, Oracle 10g, Sql server, Sales Force, Derby

Responsibilities:

  • Moved data from more than 75 databases having more than 2000 tables to HBASE using Splice.
  • Worked on Data Modelling on HBASE to normalize and faster retrieval of the data.
  • Used Spark Data frames for faster retrieval of data joins on tables of No-SQL database.
  • Coded classes using Spark / Scala for standard Validation checks using configuration files.
  • Worked on Cloudera Manager Administration to add/subtract node, user administration, issue resolutions, Add new services, versions and Cloudera upgrades.
  • Working on EDH to bring data to the central hub with intent of faster, Reliable data transfer and validation to HDFS for Data Analysis.
  • Worked on tool to bring multiple Databases/ multiple tables in the EDH in one full run.
  • Developed work flows to schedule various Hadoop programs using Active Batch.
  • Worked as lead responsible for work assignment and leading 4 practitioners in offshore.

Confidential, Thousand Oaks, CA

Sr. Hadoop Developer

Methodology: Agile (Scrum)

Technology: Hive, Spark, Scala, Pig Latin, Map Reduce, SQOOP, OOZie, HDFS, Eclipse, Cassandra, Shell Script, Hue

DBMS: Db2, Oracle 10g

Responsibilities:

  • Worked on DQM POC and Implementation with intent of faster, Reliable data transfer and validation to HDFS for Data Analysis.
  • Coded classes using Spark / Scala for Validation checks (Standard, File and Business checks) using Manifest files.
  • Processing of data received in various different formats AVRO, Parquet, Bz2, zip, Sequence, and Text Files.
  • Responsible for creating Data Partitions and Buckets using Hive.
  • Developed work flows to schedule various Hadoop programs using Oozie.
  • Worked as lead responsible for assigning and leading 5 practitioners in offshore.

Confidential, Atlanta, GA

Hadoop Lead

Methodology: CMMI

Technology: Hive, Pig Latin, Map Reduce, Sqoop, Spark, OOZie, Cassandra, HDFS, Eclipse, Hue

DBMS: DB2 (LUW), Oracle 10g

Responsibilities:

  • Worked on Customer Door Ship POC with an intent to reach out to customer before demand, Faster/ Same Day Delivery, adding value to customer and ultimate goal to increase revenues.
  • Responsible for logical and physical design of Cassandra tables.
  • Worked on node tool, CQL and Dev ops.
  • Done Data Modelling for Cassandra tables.
  • Responsible to manage data coming from different sources.
  • Importing and exporting data into HDFS using SQOOP and Flume.
  • Created Data Partitions, Buckets for better performance on Hive queries.
  • Responsible for writing Merging algorithms for incremental data updates.
  • Responsible for writing PIG (Pig Latin) scripts for ad-hoc data retrieval.
  • Developed work flows to schedule various Hadoop programs using Oozie.
  • Involved in analysis, design, testing phases and responsible for documenting technical specifications
  • Developed and supported existing map reduce programs and jobs for various data cleansing features like Schema validation, Filtering, Joins and Row Count of data.
  • Worked as lead responsible for assigning and leading 4 practitioners

We'd love your feedback!