Big Data Cloud Engineer Resume

PROFESSIONAL SUMMARY:

7 years of experience as Big Data Developer, Engineer driving cutting edge technology
Overall 13 years of experience as a Programming Polyglot in IT, Software Design and Development
Hands - on with Hadoop/Spark stack, Data Lake, NoSQL, Kafka, Streaming in On-Premise/Cloud
Well versed in Amazon Web Services, Google Cloud Platform for storage and compute capabilities
In-depth knowledge in writing MapReduce and Apache Spark using Scala/Java
Lead Big Data Engineer onsite for 5 years in big divisions of IP & Science and Finance & Risk
Experienced in server-side financial software development using tick/timeseries/ data
Interests in emerging technology, choosing NoSQL options, DS & algorithms, Performance Tuning
Developed advanced human performance platform for Athlete Management System in AWS, Spark
Proficient in Hive, Pig with large scale continually growing un/semi/structured data warehouse olap
Design & implement data pipelines, data model/massaging and optimize ETL workflows
Possess knowledge in data ingestion to HDFS from RDBMS, other sources using Sqoop/Flume
Proven track record in installing, troubleshoot & managing Cloudera, Hortonworks multinode cluster
Visa particulars: Canada Permanent Resident, Citizenship due in Sep’2019, valid USA B1

TECHNICAL SKILLS:

Hadoop: MapReduce, HDFS, YARN, Hive, Pig, Sqoop, Flume, Oozie, Hue, HBase

Spark: Spark Core, Spark SQL, Spark Streaming, Mesos

Others: NoSQL, Cassandra, Phoenix, Zookeeper, Kafka, NiFi, Ambari, Qubole

Cloud DevOps: AWS console, CLI, S3, EBS, Glacier, IAM, VPC, EC2, EMR, Kinesis, RDSGCP console, shell, GCS, IAM, GCE, DataProc, Beam, DataFlow, BigQuery

Programming: Java, Scala, C, C++, Python(beginner), Shell Script, Java Script, SQL, Testing

File Formats: Columnar(ORC, Parquet), JSON, XML, Avro, BSON, Protocol Buffers

Tools: Eclipse, IntelliJ, Maven, GitHub, SVN, Junit, Cygwin, Tomcat, Jira, MS Visio

Architecture: Big Data, Distributed systems, Stock tick data, OOAD, Architectural patterns

Domain Experience: Finance, Ecommerce, Sport Science, Telecom, Content Technology, Intellectual Property & Science, Data Lake, Data Model, UML, Star Schema

Methodology: Agile, Scrum, SDLC, Waterfall

WORK EXPERIENCE:

Confidential

Big Data Cloud Engineer

Responsibilities:

Focused on improving how our customer allocates inventories to its store locations that consider sales forecasts and regional effect
Architected Replenishment Bot from the development point of view for production
RDBMS data ingestion to GCP: migrated 4TB compressed/encrypted Oracle raw dump in persistent storage, setup Oracle DB on GCE, processed/exported CSV data using DataPump to GCS bucket
Developed routines for one-time migration of 36TB exported data from persistent store to GCS
Cleansed & preprocessed data using Scala & Spark on DataProc cluster for Data Science team to come up with machine learning techniques to create various BOTs
Imported data from Google Cloud Storage to BigQuery in form of datasets for analytics
Used Cloud Composer workflow & Airflow schedular to automatically run the job weekly
Skills: Java, Scala, Spark ecosystem, GCP, GCE, GCS, GCFS, Oracle, EDW, Scripting, MS Visio
Optimized Spark execution to finish job within 1 hour from 11 hours & opened the door to upscale
Solved infrastructure problem to utilize cluster’s full capacity: executors, executor cores & RAM
Troubleshoot disk out of space issue as Spark not using all available RAM, spilling intermediate data to disc & not freed up as job never completes, too much of disc I/O caused slow job execution
Understood project’s blocker issue, infrastructure in terms of Hadoop Spark cluster & its limitations in terms of antivirus, blocked ports, CPU/Core/RAM & no. of nodes

Advanced Analytics & Data Strategy Assessment

Confidential

Responsibilities:

Conducted interviews & workshops with IT leadership team to understand current state pain points
Evaluated Azure, Amazon & Google cloud offerings for Data Lake implementation
Led current state analysis by investigating existing functionalities & architectural stacks
Analyzed gap between As-Is and To-Be, issues faced in DB synchronization, data deduplication, etc.
Developed target architecture & delivery model to meet client’s Advanced Analytics capabilities
Skills: Big Data, AWS, GCP, Azure, Data Warehouse, PowerPoint, Diagramming Tools, MS Visio
Kinduct Technologies, HalifaxBig Data Engineer
Developed advanced human performance platform for Athlete Management System - Sports Science
Implemented data collection and backend processing for sports healthcare Analytical System
Built data-pipeline using Scala & Spark that transforms raw game data into business relevant insights
Setup secure reliable Big Data infrastructure for Customers from ground up on AWS Cloud
Setup Hortonworks HDP Hadoop environments & used Ambari to configure them
POC Project Aggregation: successfully transitioned proof of concept to production on AWS
Re-designed & developed current Dynamic Reports app to utilize Spark SQL which performs complex aggregated metrics & connect to Prod environment that provides data for Dynamic Reports
Scala/Spark: Processed raw XY Game Data in various formats like json/xml for Data Science team
NBA Production: Data Provider xml/json: Sport Radar, Second Spectrum, SportVU
Skills: Java, Scala, Hadoop/Spark ecosystem, AWS services, JSON, XML, Parquet, ORC
Thomson Reuters, Bangalore Big Data Lead Engineer
Involved in the complete life cycle of custom dataset metrics generation out of WebOfScience data for publisher, researchers & universities. Developed & optimized Hive Queries for better throughput
Designed DB schema in HDFS for research articles in form of Cubes, Fact & Dimension
Performed metrics like Times Cited, Hot Paper, Highly Cited Paper, University rank, H-Index
Installed multi-node Cloudera CDH5.4 Hadoop cluster, configured with Cloudera Manager
Optimized performance of current IMS Ingestion system to HDFS/HBase from Oracle based MPR
Used Sqoop extensively to import/export structured data to/from HDFS from/to RDBMS
Developed MapReduce for data cleansing, Hive to HBase for interactive & row level update
Date Lake Content Integration to capture & ingest Grant & Funding data from external sources
Data Lake Realtime Cross Linking of Literature (WOS) & Patents (TI) documents into AWS Cloud to create One Platform for various data owned by TR to leverage & create valuable insights
Implemented Lambda Architecture: Spark Streaming, Kafka, HBase, Hive, Patents Resolution
Collected & Aggregated large amounts of log data using Flume and staging data in HDFS
Expertise in developing custom UDF, SerDe, InputFormat in Java to extend Hive/Pig execution
Designed & implemented backend of ALUM Charts application using HBase, Spark, FusionChart to visualize Article access trends over a time period & correlation between Access to Citation it receives
Processed usage logs and filter out BOT sessions and unproductive data
Optimized performance to deal with common bottlenecks: disc, network I/O, RAM, CPU/Threads
Improved grant information coverage in WOS, created business rules to add & de-dupe grant info
Data Modeling: Star Schema Modeling, Cubes, Fact & Dimension, Visio
Environment: Cloudera CDH5.4, Hadoop2.6, Cluster: 160 nodes, Total Data Size: 3 Petabytes
Skills: Java, Scala, Hadoop/Spark ecosystem, AWS Cloud, NoSQL, Data Lake, Data Model
Developed server-side financial software for Velocity Analytics(Reuters Tick Capture Engine)
Understand finance risk management, trades, quotes, corporate actions, stock splits, symbology, etc.
Involved in multiple successful POC, Horizontal Scaling solution
Productized Elektron TimeSeries DB migration from proprietary archive to NoSQL store
Managed & analyzed Stock Tick data, Market Data Feeds, Data & Tick History data
Implemented Firebird Data migration to enhance Read-Write performance
Ingested Fact Summary in HBase as data store and query sharding for ElektronTS Hosted Solution
Prototyped Cassandra based Time Series data store, data model design for real-time data ingestion
Skills: C++, Cassandra, Hadoop, HDFS, HBase, TCL, SQL, Batch, Financial Trading
Finance Skills: Reuters Tick Capture Engine, RMDS, Timeseries & Data, Tick History
SamsungR&D Institute, Bangalore Lead Engineer
Webkit based mobile browser development; modules: WebCore, WebKit, rendering, painting, layout
Developed browser features, Dolfin as SO, debug hardware on Trace32, Watchdog issues
Skills: C++, Arm/gcc/Mingw compiler, Wireshark, Firebug, WebDeveloper, DOM, CSS, JS, HTML
Wipro Technologies, Bangalore Project Engineer
Firmware feature development in the core modules, coding, debugging, review and unit testing for MFP printers already released in the market Skills: C++, Java, Linux, make, gdb, clearcase
Freelance Big Data Trainer:
Taught entire Big Data Hadoop stack to 500+ professionals worldwide via webinar/in-class
Subject Matter Expert to Jigsaw Academy, developed Big Data Analytics online course content
Speaker for Hadoop, Hive, HBase & Big Data at Thomson Reuters Tech Connect conference

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship