We provide IT Staff Augmentation Services!

Big Data Cloud Engineer Resume

2.00/5 (Submit Your Rating)

PROFESSIONAL SUMMARY:

  • 7 years of experience as Big Data Developer, Engineer driving cutting edge technology
  • Overall 13 years of experience as a Programming Polyglot in IT, Software Design and Development
  • Hands - on with Hadoop/Spark stack, Data Lake, NoSQL, Kafka, Streaming in On-Premise/Cloud
  • Well versed in Amazon Web Services, Google Cloud Platform for storage and compute capabilities
  • In-depth knowledge in writing MapReduce and Apache Spark using Scala/Java
  • Lead Big Data Engineer onsite for 5 years in big divisions of IP & Science and Finance & Risk
  • Experienced in server-side financial software development using tick/timeseries/ data
  • Interests in emerging technology, choosing NoSQL options, DS & algorithms, Performance Tuning
  • Developed advanced human performance platform for Athlete Management System in AWS, Spark
  • Proficient in Hive, Pig with large scale continually growing un/semi/structured data warehouse olap
  • Design & implement data pipelines, data model/massaging and optimize ETL workflows
  • Possess knowledge in data ingestion to HDFS from RDBMS, other sources using Sqoop/Flume
  • Proven track record in installing, troubleshoot & managing Cloudera, Hortonworks multinode cluster
  • Visa particulars: Canada Permanent Resident, Citizenship due in Sep’2019, valid USA B1

TECHNICAL SKILLS:

Hadoop: MapReduce, HDFS, YARN, Hive, Pig, Sqoop, Flume, Oozie, Hue, HBase

Spark: Spark Core, Spark SQL, Spark Streaming, Mesos

Others: NoSQL, Cassandra, Phoenix, Zookeeper, Kafka, NiFi, Ambari, Qubole

Cloud DevOps: AWS console, CLI, S3, EBS, Glacier, IAM, VPC, EC2, EMR, Kinesis, RDSGCP console, shell, GCS, IAM, GCE, DataProc, Beam, DataFlow, BigQuery

Programming: Java, Scala, C, C++, Python(beginner), Shell Script, Java Script, SQL, Testing

File Formats: Columnar(ORC, Parquet), JSON, XML, Avro, BSON, Protocol Buffers

Tools: Eclipse, IntelliJ, Maven, GitHub, SVN, Junit, Cygwin, Tomcat, Jira, MS Visio

Architecture: Big Data, Distributed systems, Stock tick data, OOAD, Architectural patterns

Domain Experience: Finance, Ecommerce, Sport Science, Telecom, Content Technology, Intellectual Property & Science, Data Lake, Data Model, UML, Star Schema

Methodology: Agile, Scrum, SDLC, Waterfall

WORK EXPERIENCE:

Confidential

Big Data Cloud Engineer

Responsibilities:

  • Focused on improving how our customer allocates inventories to its store locations that consider sales forecasts and regional effect
  • Architected Replenishment Bot from the development point of view for production
  • RDBMS data ingestion to GCP: migrated 4TB compressed/encrypted Oracle raw dump in persistent storage, setup Oracle DB on GCE, processed/exported CSV data using DataPump to GCS bucket
  • Developed routines for one-time migration of 36TB exported data from persistent store to GCS
  • Cleansed & preprocessed data using Scala & Spark on DataProc cluster for Data Science team to come up with machine learning techniques to create various BOTs
  • Imported data from Google Cloud Storage to BigQuery in form of datasets for analytics
  • Used Cloud Composer workflow & Airflow schedular to automatically run the job weekly
  • Skills: Java, Scala, Spark ecosystem, GCP, GCE, GCS, GCFS, Oracle, EDW, Scripting, MS Visio
  • Optimized Spark execution to finish job within 1 hour from 11 hours & opened the door to upscale
  • Solved infrastructure problem to utilize cluster’s full capacity: executors, executor cores & RAM
  • Troubleshoot disk out of space issue as Spark not using all available RAM, spilling intermediate data to disc & not freed up as job never completes, too much of disc I/O caused slow job execution
  • Understood project’s blocker issue, infrastructure in terms of Hadoop Spark cluster & its limitations in terms of antivirus, blocked ports, CPU/Core/RAM & no. of nodes

Advanced Analytics & Data Strategy Assessment

Confidential

Responsibilities:

  • Conducted interviews & workshops with IT leadership team to understand current state pain points
  • Evaluated Azure, Amazon & Google cloud offerings for Data Lake implementation
  • Led current state analysis by investigating existing functionalities & architectural stacks
  • Analyzed gap between As-Is and To-Be, issues faced in DB synchronization, data deduplication, etc.
  • Developed target architecture & delivery model to meet client’s Advanced Analytics capabilities
  • Skills: Big Data, AWS, GCP, Azure, Data Warehouse, PowerPoint, Diagramming Tools, MS Visio
  • Kinduct Technologies, HalifaxBig Data Engineer
  • Developed advanced human performance platform for Athlete Management System - Sports Science
  • Implemented data collection and backend processing for sports healthcare Analytical System
  • Built data-pipeline using Scala & Spark that transforms raw game data into business relevant insights
  • Setup secure reliable Big Data infrastructure for Customers from ground up on AWS Cloud
  • Setup Hortonworks HDP Hadoop environments & used Ambari to configure them
  • POC Project Aggregation: successfully transitioned proof of concept to production on AWS
  • Re-designed & developed current Dynamic Reports app to utilize Spark SQL which performs complex aggregated metrics & connect to Prod environment that provides data for Dynamic Reports
  • Scala/Spark: Processed raw XY Game Data in various formats like json/xml for Data Science team
  • NBA Production: Data Provider xml/json: Sport Radar, Second Spectrum, SportVU
  • Skills: Java, Scala, Hadoop/Spark ecosystem, AWS services, JSON, XML, Parquet, ORC
  • Thomson Reuters, Bangalore Big Data Lead Engineer
  • Involved in the complete life cycle of custom dataset metrics generation out of WebOfScience data for publisher, researchers & universities. Developed & optimized Hive Queries for better throughput
  • Designed DB schema in HDFS for research articles in form of Cubes, Fact & Dimension
  • Performed metrics like Times Cited, Hot Paper, Highly Cited Paper, University rank, H-Index
  • Installed multi-node Cloudera CDH5.4 Hadoop cluster, configured with Cloudera Manager
  • Optimized performance of current IMS Ingestion system to HDFS/HBase from Oracle based MPR
  • Used Sqoop extensively to import/export structured data to/from HDFS from/to RDBMS
  • Developed MapReduce for data cleansing, Hive to HBase for interactive & row level update
  • Date Lake Content Integration to capture & ingest Grant & Funding data from external sources
  • Data Lake Realtime Cross Linking of Literature (WOS) & Patents (TI) documents into AWS Cloud to create One Platform for various data owned by TR to leverage & create valuable insights
  • Implemented Lambda Architecture: Spark Streaming, Kafka, HBase, Hive, Patents Resolution
  • Collected & Aggregated large amounts of log data using Flume and staging data in HDFS
  • Expertise in developing custom UDF, SerDe, InputFormat in Java to extend Hive/Pig execution
  • Designed & implemented backend of ALUM Charts application using HBase, Spark, FusionChart to visualize Article access trends over a time period & correlation between Access to Citation it receives
  • Processed usage logs and filter out BOT sessions and unproductive data
  • Optimized performance to deal with common bottlenecks: disc, network I/O, RAM, CPU/Threads
  • Improved grant information coverage in WOS, created business rules to add & de-dupe grant info
  • Data Modeling: Star Schema Modeling, Cubes, Fact & Dimension, Visio
  • Environment: Cloudera CDH5.4, Hadoop2.6, Cluster: 160 nodes, Total Data Size: 3 Petabytes
  • Skills: Java, Scala, Hadoop/Spark ecosystem, AWS Cloud, NoSQL, Data Lake, Data Model
  • Developed server-side financial software for Velocity Analytics(Reuters Tick Capture Engine)
  • Understand finance risk management, trades, quotes, corporate actions, stock splits, symbology, etc.
  • Involved in multiple successful POC, Horizontal Scaling solution
  • Productized Elektron TimeSeries DB migration from proprietary archive to NoSQL store
  • Managed & analyzed Stock Tick data, Market Data Feeds, Data & Tick History data
  • Implemented Firebird Data migration to enhance Read-Write performance
  • Ingested Fact Summary in HBase as data store and query sharding for ElektronTS Hosted Solution
  • Prototyped Cassandra based Time Series data store, data model design for real-time data ingestion
  • Skills: C++, Cassandra, Hadoop, HDFS, HBase, TCL, SQL, Batch, Financial Trading
  • Finance Skills: Reuters Tick Capture Engine, RMDS, Timeseries & Data, Tick History
  • SamsungR&D Institute, Bangalore Lead Engineer
  • Webkit based mobile browser development; modules: WebCore, WebKit, rendering, painting, layout
  • Developed browser features, Dolfin as SO, debug hardware on Trace32, Watchdog issues
  • Skills: C++, Arm/gcc/Mingw compiler, Wireshark, Firebug, WebDeveloper, DOM, CSS, JS, HTML
  • Wipro Technologies, Bangalore Project Engineer
  • Firmware feature development in the core modules, coding, debugging, review and unit testing for MFP printers already released in the market Skills: C++, Java, Linux, make, gdb, clearcase
  • Freelance Big Data Trainer:
  • Taught entire Big Data Hadoop stack to 500+ professionals worldwide via webinar/in-class
  • Subject Matter Expert to Jigsaw Academy, developed Big Data Analytics online course content
  • Speaker for Hadoop, Hive, HBase & Big Data at Thomson Reuters Tech Connect conference

We'd love your feedback!