We provide IT Staff Augmentation Services!

Sr. Bigdata Aws Lead Resume

3.00/5 (Submit Your Rating)

GA

SUMMARY

  • 13+ years of IT experience in Data Warehousing & Bigdata Data lake with emphasis on Project Planning & Management, Business Requirement Analysis, Application Design, Development, Testing, implementation and maintenance of Data Warehouse using wide range of Technologies such as AWS, EMR, S3, Google Cloud, Google Big Query, ML/AI, BigData Hadoop stack, Redshift, Teradata, SQL Server, Informatica, Unix, Spark - Scala, Python and Erwin data modelling.
  • 7+ years’ experience as ETL Developer cum Bigdata Architect in building enterprise data warehouse applications using Teradata, Informatica, Erwin Data Modelling, Bigdata Hadoop complete stack development with AWS cloud and 5+ years relevant experience in Google cloud applications development, building scalable and high performance Big Data Analytical Systems with specialization in Big Query and Hadoop Platform.
  • Dynamic Data Analytics leader with a successful track record of building Data Warehouses, Business Intelligence & Analytic Solutions that empowers companies to harness and monetize their data assets.
  • Broad knowledge and perspective in data pipeline, data collection, data management, data engineering, reporting, analytics, and product/application development.
  • Confident understanding of analytical and transactional databases with Teradata-EDW, Google Cloud & Hadoop systems.
  • Played a pivotal role in building centralized Enterprise Data Hub using Hadoop with AWS cloud and Google Cloud Platform that can cater to all the data analytical needs of an Enterprise.
  • Good experience in architecting real time streaming applications and batch style large scale distributed computing applications using tools like Spark, Sqoop, Flume, Map-Reduce, and Hive etc.
  • Good experience in creating complex data ingestion pipelines, data transformations, data management and data governance, real time streaming engines at an Enterprise level.
  • Good Experience in developing and implementing big data solutions and data mining applications on Hadoop using Hive, Pig, Hbase, Cassandra, Hue, Oozie workflows and designing and implementing Spark scala and Python programs.
  • Good experience in writing Spark jobs using Scala-Maven.
  • Expertise in working with large collection of data sets using Spark SQL and in-memory data frames, datasets etc.
  • Hands on Experience in loading CSV, AVRO, PARQUET, XML, JSON file formats via Spark Scala and PySpark programming.
  • Implementation and support of Apache Solr solutions.
  • Construct data-pipelines by developing Python jobs and deploying on Google cloud platform.
  • Processing different types of files csv, xml, json with different file foramts such as ORC and other compressed file formats in Google Cloud and BigQuery.
  • Well versed in installing, configuring, supporting, managing and fine-tuning Peta Byte scale Hadoop Clusters.
  • Experience in Analysis and Design, Performance Tuning, Query Optimization, Stored procedures, functions, packages, triggers, views and indexes to implement the business logics of database in Teradata, Sql Server and loading data warehouse tables like dimensional, fact and aggregate tables using SSIS, Teradata Utilities.
  • Experience in Relational Data Modeling and Dimensional Data Modeling, Star Schema Modeling, Physical and Logical Data Modeling, Erwin 7/7.3.
  • Experience in using Teradata Administrator, Teradata Manager, Teradata PMON, Teradata SQL Assistant and writing Teradata load/export scripts like BTEQ, Fast Load, Multi Load, TPUMP, TPT and Fast Export in UNIX/Windows environments.
  • Expert in no-sql databases MongoDB, Cassandra, Hbase.
  • Expert in Spark-Scala programming, Spark-Sql, Unix Scripting, Hive/Beeline, Pig, Sqoop, Flume, Splunk, HUE.
  • Constructed Data Pipelines and hosted all the hadoop applications onto Google cloud platform and AWS-EMR.
  • Good Experience in Google Cloud, Big Query, Big-Table, AWS Cloud Redshift and S3.
  • Good experience in writing Unix shell scripting.

TECHNICAL SKILLS

No-Sql databases: Cassandra, Mongo-DB and Hbase

AWS Technologies: AWS Cloud, EMR, S3 and Lamda

Cloud Technologies: Google Cloud - Bigquery, GCS, Data Prep, Data Studio, Stack Driver, Composer, App Engine, Regression Analysis, Data Mining, Prediction Analysis, Machine Learning, Vision, Natural Language processing

Big Data Technologies: Hadoop - SPARK, Map Reduce, Sqoop, Hive, Flume, Hue, Oozie workflow, Amabri, Pig, Kafka and Nifi

Databases: Oracle, Teradata, Microsoft SQL Server, MS Access

Programming Languages: Python, Scala, UNIX Shell Scripting, SQL, PL/SQL

Database Utilities: BTEQ, Fast Load, TPT, MLoad, Fast Export, TPump

ETL Tools: Informatica, Talend, Teradata Tools & Utilities, Teradata Sql Assistant

Reporting Tools: Alteryx, Tableau and Big Query

Web Servers: IIS 6.0, Apache 1.3.x, Tom Cat, Windows 2003/2008.

Other Tools: Git-Hub, Source-Tree, Jenkins, Control-M

PROFESSIONAL EXPERIENCE

Confidential, GA

Sr. Bigdata AWS lead

Technologies Used: Sqoop, Spark 1.6/2.1, Scala 2.10.5/2.11.11 , Python, Maven, Shell Scripting, Source Tree, Git-Hub, Eclipse, Splunk, MongoDB, Sql Server, Oracle, Postman, Jenkins, Hadoop-Yarn, Hive, Hue, Pig, Oozie, Nifi,Hbase, Solr, AWS-EMR, S3 and Redshift.

Responsibilities:

  • As part of Bigdata center of excellence (COE) responsible for creating technical guidance, road map and strategies in delivering various big data solutions throughout the Confidential Total Source Project.
  • Involved in building a centralized enterprise level Bigdata across the Total Source Project.
  • Fine tuning, stabilizing Hadoop platform for allowing real time batch style Bigdata applications to run smoothly with optimal cluster utilization.
  • Created POC’s using spark streaming and flume for real time stream processing of continuous stream of large data sets.
  • Involved in design and implementation of Hadoop ingestion framework for external sources using Spark JDBC, Scoop and Flume. External sources involves like Oracle, SQL Server, Teradata, MySQL, Hive, MongoDB, Hbase, Redshift etc.
  • Design and developed the batch process to ingest large amounts of structured/unstructured data to AWS Redshift.
  • Implemented the predictive model to run Spark extensively using the RDD, DAG’s, Spark dataframes, Spark SQL and Spark Streaming.
  • Created workflow and scheduling jobs for the application using Oozie coordinator/control M.
  • Followed agile methodologies to successfully deliver the projects in sprints.
  • Implementation and support of Apache Solr solutions.

Confidential, Anthem, GA

Sr. Bigdata AWS lead

Technologies Used: Sqoop, Spark 1.6/2.1, Scala 2.10.5/2.11.11 , Python, Java, Maven, Shell Scripting, Hive, Shell Scripting, Rest-API, Source Tree, Eclipse, Splunk, MongoDB, Sql Server, Oracle, Postman, Git-Hub, Control-M, Spring boot, Bamboo, Dockers, SOA API’S, Hadoop, Hue, Oozie, Nifi, Hbase, AWS-EMR, S3 and Redshift

Responsibilities:

  • Participated from start of the project from designing and gathering requirements from the business stakeholders and finalizing the stack components and implementation.
  • Data ingestion by using Sqoop from various source like Oracle, SQL Server and BDF data (big data fabric) to Hive Tables.
  • Very Strong hands on experience in writing shell scripting.
  • Good hands on experience on Scheduling jobs through Control-M.
  • Developed Spark Engine Job which loads data from Hive Tables to Mongo Collections.
  • Very Strong Hands on experience in Spark Scala programming, Java and Rest-API's development.
  • Implemented lot many features in Spark Engine Job such as CDC (Change Data Capture logic), Audit Process, Processing Historical and Incremental loads, optimizing Spark Engine Jobs using Scala Programing.
  • Where CDC will capture from Hive Staging for newly updated and Inserted records and drop duplicated records and load the new record into Mongo Collections from different sources.
  • Worked on data quality and optimized the queries in spark for the performance issue.
  • Developed Java-RestAPI's and deployed in Docker.
  • Developed API’S for GET and POST call as per the end to end Business Logic.
  • Migrated all the application in AWS EMR cluster, S3 and Redshift.
  • Implementation and support of Apache Solr solutions.

Confidential, GA

Sr. Bigdata Hadoop cum Google Cloud Developer

Technologies Used: Sqoop, Spark 1.6/2.1, Scala 2.10.5/2.11.11 , Python, Java, Maven, Shell Scripting, Hive, Jenkins, Git-Hub, Grid Stats, Eclipse, JUnit, Splunk, Google Cloud-Big Query,BigTable, Cassandra, MongoDB, Talend, Sql Server, Oracle, Postman,Solr, Git-Hub and Maestro-TWS.

Responsibilities:

  • Document detailed design, functional and technical specifications in developing Lucidworks Fusion and Apache Solr solutions.
  • Created end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
  • Developed Spark scala jobs to load Citi Feed Text/Json format to Hive parquet Tables in Production environment, which reduced disk space and minimize cost in Hadoop ecosystem.
  • Developed Spark applications using Scala to pull data from different sources Oracle, SQL Server, Teradata, MySQL database using JDBC connections.
  • As a part of Epiphany feeds (legacy system) retirement process, delivered and migrated Foresee, Acxiom Preferences & Email Ids Stamping feeds from Sql Server to Hadoop ecosystem and saved cost to the company.
  • Involved in the development of Home Depot applications: ECC, CCA, CGR and SVOC.
  • Expert in no-sql databases MongoDB, Cassandra, Hbase.
  • Expert in Spark-Scala programming, Spark-Sql, Unix Scripting, Hive/Beeline, Pig, Sqoop, Flume, Splunk, HUE etc.
  • Developed ETL mapping and workflows using Talend and Ingested data to Data lake.
  • Created jobs and constructed data pipelines and workflows using Airflow, Composer, Appengine and Compute Engine on Google Cloud Platform.
  • Developed jobs using Spark Scala and Python on Google Cloud Platform.
  • Expert in Google Cloud, GCS, Big Query, BigTable and all the Google components.
  • Analysing data using Big Query and generating reports using Tableau.

We'd love your feedback!