We provide IT Staff Augmentation Services!

Big Data Architect Resume

4.00/5 (Submit Your Rating)

Alexandria, VA

SUMMARY:

  • Over 12 years of experience in Development, Analysis, Design, Integration, and Presentation with ETL tools along with 4 years of Big Data /Hadoop experience in hadoop ecosystem such as Hive, Pig, Flume, Sqoop, Zookeeper, Hbase, SPARK, Kafka, Python and AWS.
  • Worked in INDIA, SWEDEN, and DENMARK and currently working in USA on various technical, functional and business assignments.
  • Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
  • Hands on experience on core Java and solid experience in debugging Map Reduce applications.
  • Strong experience in building and debugging the Spark & Scala applications using functional programming concepts.
  • Importing data using Sqoop into HDFS from various Relational Database Systems.
  • Worked on standards and proof of concept in support of CDH4 and CDH5 implementation using AWS cloud infrastructure.
  • Hands on experience in implementing complex business logic and optimizing the query using Hive QL and controlling the data distribution by partitioning and bucketing techniques to enhance performance.
  • Good knowledge in Architecting, Designing, re - Engineering and Performance Optimization.
  • Experienced in building highly scalable Big-data solutions using Hadoop and multiple distributions, i.e. Cloudera, Horton works and NoSQL platforms (Hbase & Cassandra).
  • Experienced with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map Reduce and Pig jobs.
  • Very Good Knowledge in Logical and Physical Data modeling, creating new data models, data flows and data dictionaries. Hands-on experience in using relational databases like Oracle and MS-SQL Server.
  • Experienced with version control systems like SVN and Serena.
  • Extensively worked on ETL processing tools like Pentaho and SSIS.
  • Knowledge on different reporting tools (SSRS, Cognos).
  • Exceptional ability to learn and master new technologies and to deliver outputs in short deadlines.
  • Quick Learner, with high degree of passion and commitment in work.
  • Has domain expertise in Health Insurance, Telecom, and Finance & Airline domain.
  • Proficiency with mentoring and on-boarding new engineers who are not proficient in Hadoop and getting them up to speed quickly.
  • Knowledge on MPP databases like Impala & SQL Server APS polybase.

TECHNICAL SKILLS:

Platforms/ cloud services: Windows, Ubuntu, Linux, CentOS 7,EMR,S3

Big Data Technologies: Apache Hadoop, HDP 2.2.4, CDH5, Hive, Sqoop, Pig, Flume, Zookeeper, Hbase, Solr, Spark, Storm, Kafka

Machine Learning Platforms: Apache Mahout

Web Servers: Apache 1.3/2.x, IIS 5.0/6.0

Application Servers: BEA Weblogic, Apache Tomcat 5.x/7.x

Languages: Shell scripting, HTML, XML, SQL,C#

Packages: J2EE, JDK

Databases: Oracle, SQL Server, MySQL

Tools: Apache Solr, Jenkins, Matlab, JIRA,Visual Studio 2010/2008

Version Control: SVN,GITHUB,CVS

BI Tools: SSIS,SSRS,SSAS & Cognos Report studio 10

PROFESSIONAL EXPERIENCE:

Confidential, Alexandria, VA

Big Data Architect

Technologies and Frameworks: Python, Spark, Hive, Yarn, NIFI, Kibana, and Elastic search

Responsibilities:

  • Defined the logic for hitting CMS restapi and get the documents for a patent application and loaded in to HDFS.
  • Design and implement ETL pipelines in spark using Python.
  • Responsible for developing NIFI flows and creating schedules for data extraction and running pipelines by identifying the dependencies.
  • Designed the ETL flows by combining data from different sources and exported to elastic search.
  • Responsible for developing column mapping, transformations and data profiling tasks.

Confidential, Waltham, MA

Big Data Sr. Consultant

Technologies and Frameworks: Python, Spark, AWS, Ec2, S3, Lambda, Airflow, Qubole and SBT

Responsibilities:

  • Defined the logic for identifying visitor stitching use case scenarios and identify Visit and Sub Visit.
  • Design and implement ETL solution in spark for visitor stitching.
  • Involved in processing data from AWS S3 buckets and loaded into EDW tables.
  • Worked on implementing optimizations techniques, persistence/caching, compression concepts which are gained through development experience.
  • Responsible for developing column mapping, transformations and data profiling tasks.
  • Responsible for scheduling ETL tasks in Airflow, which is an ETL task scheduler and dependency manager.
  • Worked extensively on debugging and making changes ETL spark solution.
  • Configured git repositories and used different sbt plugins to manage the build dependencies.
  • Developed a testing framework in Scala to test different use cases in visitor stitching.
  • Deploy the developed jar files in to Qubole and make sure spark environment is correctly spinning up with latest changes.
  • Load the data from Parquet files in to APS tables by using scheduled jobs.

Confidential, Jersey City, NJ

Big Data Sr. Consultant

Technologies and Frameworks: Scala, HBase, Flume, Spark, Hive, Java, AWS, EC2, S3, Lambda, Kafka, and SBT

Responsibilities:

  • To react to fraud in a timely manner, developed a look up the history of any user in few milliseconds with options of using HBase NoSQL databases and caches for quick retrieval of information.
  • Worked on various machine-learning models and applied on data to find correlation between different entities.
  • Hands on experience in various models like Linear Regression, Decision Trees, Naïve Bayes, Random Forests and K-means Clustering and worked on MLLib, GraphX libraries and spark streaming.
  • Involved in loading data into HBase using HBase Shell, HBase Client API.
  • Very good understanding of Spark core concepts and internals like partitions, memory management, functions of executors and drivers, shuffling, optimizations techniques, persistence/caching, compression concepts which are gained through development experience.
  • Responsible for developing data pipeline using flume and Kafka to extract the data and store in HDFS.
  • Used Maven extensively for building jar files of Java MapReduce programs and deployed to Cluster.
  • Created Multimode projects using SBT, developed the build script, and used various sbt plugins to manage the build dependencies.

Confidential

Senior Big Data Developer

Responsibilities:

  • Responsible for ingesting data in to HDFS using Sqoop from relational databases Sql Server and DB2.
  • Responsible for reconciliation between current records in HIVE and new change records.
  • Responsible for performing extensive data validation using Hive.
  • Created Sqoop jobs, PIG and Hive scripts for data ingestion from relational databases to compare with historical data.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, Mongo DB, NoSQL and a variety of portfolios.
  • Hands on experience in adjusting the memory settings for Map or Reduce and any other performance optimizations needed to make the job execute in the agreed SLA time.
  • Responsible for debugging failed jobs in production.

Project Description: Safari Surprise monitors fares, fare changes by other airlines, and distribute the information to the pricing analysts. Monitor competitor’s airfares worldwide and match competitor’s airfares (within 24h). The system prepares records for semi-automatic fare matching according to the parameters set by the pricing analysts.

Confidential

ETL Architect

Responsibilities:

  • Utilized SQL*Loader to load flat files database tables.
  • Responsible for SQL tuning and optimization using Analyze, Explain Plan, TKPROF utility and optimizer hints.
  • Responsible for performing code reviews.
  • Designed and created different ETL packages using SSIS from Oracle source to SQL Server as destination
  • Installed upgraded and migrated databases from Oracle database to MS SQL Server 2008.
  • Transformed data from various data sources using OLE DB connection by creating various SSIS packages

Used: Oracle 8i/9i, PL/SQL, SQL*PLUS, SQL, SQL*Loader, PL/SQL Developer, Explain Plan and TKPROF tuning utility, Microsoft SQL Server 2008, MS Windows Server 2003/2008,SSIS.

Confidential

Senior Database Developer and Analyst

Responsibilities:

  • Responsible for analyzing business functionality for the back end processes
  • SQL Server 2008 RDBMS database development using Confidential -SQL programming, queries, stored procedures, views.
  • Performed tuning and optimization on SQL queries using Explain Plan and optimizer hints.
  • Created SQL reports, data extraction and data loading scripts for different databases and schemas
  • Developed SSIS packages using different type’s task and with Error Handling.
  • Developed reports on SSRS on SQL Server (2005).
  • Used SSIS to create ETL packages to Validate, Extract, Transform and Load data to Data Warehouse and Data Mart Databases.

Used: MS SQL Server 2005 Enterprise Edition, Windows 2003 Enterprise Edition, Windows 2000 Advanced Server, SSIS, and SSRS.

Confidential

ETL Developer

Responsibilities:

  • Estimation, Analyzing FRD’s, Preparing Design Documents, writing Unit Test cases, Coding, Unit testing & guiding team members
  • Responsible for package deployment and development
  • Responsible for troubleshooting production problems Writing queries to front end designers.
  • Used DTS for data population from different sources and create DTS packages for data conversions and load data from flat flies or excel files.
  • Fine tune complex SQL queries and Stored Procedures to increase performance.
  • Worked as a developer in creating complex Stored Procedures, DTS packages, triggers, cursors, tables, and views and other SQL joins and statements for applications.

Used: SQL Server 2000,MS.Net 2.0 MS SQL Server 2000, Windows NT/2000 Server, MS Access, Query Analyzer, DTS, BCP, SQL Profile

We'd love your feedback!