We provide IT Staff Augmentation Services!

Senr Bigio data Engineer Resume

4.00/5 (Submit Your Rating)

VA

SUMMARY:

  • Over 8 years of progressive professional experience in data analysis, system design, development using Bigdata/Hadoop, Teradata and Mainframes technologies
  • More than 4+ years of in depth knowledge and hands on experience on Bigdata/Hadoop core components like MapReduce, HDFS, Spark, Hive, Impala, Sqoop, Oozie, Hue
  • Experienced in working with business experts to identify, prioritize and implement solutions to improve efficiency and support new business
  • Experienced in supporting ad - hoc requests and creating reusable queries using MS Excel and SQL Query
  • Good in analyzing gaps in current process and work with SME to resolve and process the data on priority
  • Experienced professional with a successful career in banking, finance and insurance domain
  • Good acumen in software development cycles involving system study, analysis, development, enhancement, implementation and support activities
  • Experienced in CI/CD related activities involving GIT, BitBucket, Jenkins, Artifactory, Ansible & JIRA
  • Experienced in Agile and Waterfall methodology
  • Vast experience in mapping client requirement and designing the solutions by understating the core of the change
  • Experienced in ETL processing via Spark, Hive and ETL tools
  • Experienced in writing Unix Shell scripting for builds and deployment in different environments
  • Strong experience working in relational database like Teradata and in mainframes
  • Good exposure to integrated testing, data analysis and data validation on Hadoop Environment
  • Proactive nature has earned laurels from clients
  • Excellent interpersonal skills which helps in clearly stating and recording ideas
  • Vast analytical, organizational and leadership skills earned vital roles

TECHNICAL SKILLS:

Hadoop Technology: Cloudera Hadoop, MapReduce (MR1, MR2- YARN), Spark, HDFS, Hive, Impala, Pig, Sqoop, Oozie, Hue, Cloudera Manager, Kafka, Flume, HCatalog, Spark Streaming, PySpark, HBase, Druid, TEZ, Ambari, Jupyter Hub / Zeppelin Notebooks

Operating System: UNIX, Linux, MS-DOS, Windows, OS 390 Mainframe

Database: Teradata, DB2, IMS, Presto DB

ETL Tool: DMXpress Hadoop ETL tool by Syncsort, Talend, IBM DataStage, IBM Big SQL

CI/CD Tool: JIRA, SVN, BitBucket, GIT, Jenkins, Artifactory, Ansible, CVS

Scheduling Tool: Autosys, CA7, Crontab, Tidal

Language: Java, Scala, JCL, COBOL, SQL, Unix shell script, Python, IMS

Other Software: Eclipse, Maven, SharePoint, Maximo/Remedy, Teradata SQL assistant, TSO/ISPF, MS Office Tools, QlikView

PROFESSIONAL EXPERIENCE:

Confidential, VA

Senr Big io DATA ENGINEER

Responsibilities

  • Attending business meetings and collaborate with business teams to understand & articulate the project requirements and assess them with the development team
  • Helping to create high quality documentation supporting the design/coding task (Data Lineage, Data Mapping, High/Low Level Design etc.)
  • Developing the new ETL/ELT framework using Big Data (Hadoop, Teradata, Informatica, IBM Data Stage etc.)
  • Performing Data migration/Data Ingestion from source systems to distributed file systems using various tools like NiFi, SQOOP, IBM Data Stage, TDCH etc. to achieve best results and maximum throughput
  • Performing POC for real time streaming data using Kafka and Spark Streaming
  • Creating Data Lake with maintain staging and base layer on Hadoop to maintain various datasets from different line of business
  • Performing ETL/ELT operation via Spark (PySpark) using Spark SQL, RDD operations etc. and store resultant as tables in Hive
  • Creating hive tables (external/managed) with partitioning/buckets based on the amount of data that is being processed
  • Deciding the ideal storage platform for the application being designed based on the type of the data (historical or incremental), format of the data (structured, semi-structured and unstructured), compression requirements, data frequency, pattern and consumer of the data
  • Preparing a security framework to maintain the data privacy for the data stored on the distributed file system
  • Creating reusable components to be used to perform similar set of operations
  • Performing Data archival by creating components using DistCP and Hadoop archival command to reduce name node utilization
  • Getting the data ready for the data visualization tools like Tableau, QlikView, Jupyter Hub / Zeppelin Notebooks
  • Performing performance tuning on existing applications and increase their throughput by various techniques
  • Working with the system admins to change cluster configuration /settings in order to achieve optimal performance from cluster
  • Using Presto DB to perform analytical queries for business users related
  • Using JIRA, SVN, BitBucket, GIT for CI/CD related activities

Confidential, NC

LEAD Big DATA Developer cum Analyst (Teradata & Hadoop) | Period

Responsibilities

  • Gathering necessary information from users on Anti Money Laundering via Transaction, Customer profiling and Posting data
  • Doing end to end data analysis to understand the business logic and design approach to build a new data model
  • Creating mapping documents for AML domain for data lineage involving Teradata and Hadoop as Source and Target systems respectively
  • Cataloging and documenting the data sources applicable for use cases to form the data layer
  • Creating data flow diagram using MS VISIO, etc.
  • Creating data models for the landing zone using the documentations
  • Working with the Business Team to gather the requirements and prioritize their needs
  • Developing and implementing data collection reports that optimized statistical efficiency and data quality
  • Working with clients on initiatives involving Architecture, Data Warehousing, Data Platform Migration, Performance & Optimization, Data Analysis, ETL Development, and Hadoop Data Integration leveraging my knowledge of Hadoop, Teradata, ETL and Analytics to solve Customers' problems
  • Understanding the various sources involved for formulating various use case like Panama Papers, FinCen 314(a), Foreign Terrorist Fighters etc.
  • Understanding the various data sources involving various transaction mode like Wire, Cash, Card, etc.
  • Proposing technical solution and paving out the plan for successful implementation
  • Preparing High-Level and Low-Level Design document
  • Supporting the daily load and incremental load from Teradata to Hadoop data layer
  • Embedding data quality checks using Teradata, Hive, Spark etc.
  • Performing the unit testing and tune the code as required
  • Preparing necessary technical standard and functional manuals for application
  • Scheduling the jobs using Autosys scheduler
  • Using JIRA, SVN, BitBucket, GIT, Jenkins, Artifactory & Ansible for CI/CD related activities

Confidential, NC,

LEAD Big DATA Developer cum Analyst (Teradata & Hadoop)

Responsibilities

  • Converting the existing Mainframe - Teradata ETL to Hadoop ETL in order to leverage Teradata computational storage
  • Doing end to end data analysis to understand the business logic and design approach to build a new data model
  • Creating mapping documents for data lineage involving Teradata and Hadoop as Source and Target systems respectively
  • Cataloging and documenting the data sources applicable for use cases to form the data layer
  • Creating data flow diagram using MS VISIO, etc.
  • Creating data models for the landing zone using the documentations
  • Proposing technical solution and paving out the plan for successful implementation
  • Preparing High-Level and Low-Level Design document
  • Using Syncsort’s DMX-H ETL tool to facilitate application development in HDFS
  • Developing Map Reduce and Spark codes to support the use cases
  • Using JAVA and SCALA for programming
  • Developing HIVE Scripts equivalent to Teradata
  • Using SQOOP to import the data in/out of Teradata
  • Developing automated scripts for all jobs in order to complete loading data from MAINFRAME to TERADATA after processing in Hadoop
  • Handling data from FLUME and KAFKA sources via SPARK STREAMING
  • Scheduling the Hadoop jobs using OOZIE and AUTOSYS
  • Developing customized HIVE UDFs
  • Handling Fixed block, Variable block, Text Delimited, Binary, AVRO, PARQUET files
  • Using Network Data Movement (NDM) / Connect Direct to move data across servers
  • Developing MAP REDUCE and SPARK code to structure the data
  • Using IMPALA for end user queries and validation
  • Solving issues raised by other application teams via Nexus request
  • Building archival and recovery jobs for DR purpose
  • Building reusable common components which will reduce application coding effort
  • Preparing necessary technical standard and functional manuals for application
  • Using JIRA, SVN, BitBucket, GIT, Jenkins, Artifactory & Ansible for CI/CD related activities

Confidential

Lead Developer cum analyst

Responsibilities

  • Used Hadoop as a data processing layer when moving the data from MAINFRAME to TERADATA
  • Used Syncsort’s DMX-H ETL tool to facilitate application development in HDFS
  • Developed MAP REDUCE using JAVA for data manipulation
  • Used HIVE, OOZIE and SQOOP extensively for ETL processing
  • Created a batch calculation process with help of historical data which consisted of account balance, aggregated deposits & investments of the customer
  • Designed the model and flow to achieve the requirement
  • Changed the BTEQ /MLOAD/TPUMP/FLOAD/FASTEXPORT/TPT/JCL scripts as per requirement
  • Wrote and executed the Teradata SQL scripts to validate the end data
  • Created views on the tables along with access categories to provide data access to the users
  • Prepared design, test plan, implementation plan, test scripts, validation script and unit testing documents
  • Prepared Job flow diagram in MS VISIO in order to handover the implementation to production support team
  • Tuned the bad performing Teradata SQL queries and inefficient collect stats
  • Provided root cause analysis on critical and non-critical issues occurred in production
  • Analyzed the dashboard and performance metrics
  • Prepared necessary technical and functional manuals for the application
  • Using JIRA, SVN, BitBucket, GIT, Jenkins, Artifactory & Ansible for CI/CD related activities

We'd love your feedback!