Big Data Architect Resume
Alexandria, VA
SUMMARY
- Over 12 years of experience in Development, Analysis, Design, Integration, and Presentation with ETL tools along with 4 years of Big Data /Hadoop experience in hadoop ecosystem such as Hive, Pig, Flume, Sqoop, Zookeeper, Hbase, SPARK, Kafka, Python and AWS.
- Worked in INDIA, SWEDEN, and DENMARK and currently working in USA on various technical, functional and business assignments.
- Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
- Hands on experience on core Java and solid experience in debugging Map Reduce applications.
- Strong experience in building and debugging the Spark & Scala applications using functional programming concepts.
- Importing data using Sqoop into HDFS from various Relational Database Systems.
- Worked on standards and proof of concept in support of CDH4 and CDH5 implementation using AWS cloud infrastructure.
- Hands on experience in implementing complex business logic and optimizing the query using Hive QL and controlling the data distribution by partitioning and bucketing techniques to enhance performance.
- Good knowledge in Architecting, Designing, re - Engineering and Performance Optimization.
- Experienced in building highly scalable Big-data solutions using Hadoop and multiple distributions, i.e. Cloudera, Horton works and NoSQL platforms (Hbase & Cassandra).
- Experienced with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map Reduce and Pig jobs.
- Very Good Knowledge in Logical and Physical Data modeling, creating new data models, data flows and data dictionaries. Hands-on experience in using relational databases like Oracle and MS-SQL Server.
- Experienced with version control systems like SVN and Serena.
- Extensively worked on ETL processing tools like Pentaho and SSIS.
- Knowledge on different reporting tools (SSRS, Cognos).
- Exceptional ability to learn and master new technologies and to deliver outputs in short deadlines.
- Quick Learner, with high degree of passion and commitment in work.
- Has domain expertise in Health Insurance, Telecom, and Finance & Airline domain.
- Proficiency with mentoring and on-boarding new engineers who are not proficient in Hadoop and getting them up to speed quickly.
- Knowledge on MPP databases like Impala & SQL Server APS polybase.
TECHNICAL SKILLS
Platforms/ cloud services: Windows, Ubuntu, Linux, CentOS 7,EMR,S3
Big Data Technologies: Apache Hadoop, HDP 2.2.4, CDH5, Hive, Sqoop, Pig, Flume, Zookeeper, Hbase, Solr, Spark, Storm, Kafka
Machine Learning Platforms: Apache Mahout
Web Servers: Apache 1.3/2.x, IIS 5.0/6.0
Application Servers: BEA Weblogic, Apache Tomcat 5.x/7.x
Languages: Shell scripting, HTML, XML, SQL,C#
Packages: J2EE, JDK
Databases: Oracle, SQL Server, MySQL
Tools: Apache Solr,Jenkins, Matlab, JIRA,Visual Studio 2010/2008
Version Control: SVN,GITHUB,CVS
BI Tools: SSIS,SSRS,SSAS & Cognos Report studio 10
PROFESSIONAL EXPERIENCE
Confidential, Alexandria, VA
Big Data Architect
Responsibilities:
- Defined the logic for hitting CMS restapi and get the documents for a patent application and loaded in to HDFS.
- Design and implement ETL pipelines in spark using Python.
- Responsible for developing NIFI flows and creating schedules for data extraction and running pipelines by identifying the dependencies.
- Designed the ETL flows by combining data from different sources and exported to elastic search.
- Responsible for developing column mapping, transformations and data profiling tasks.
Confidential, Waltham, MA
Big Data Sr. Consultant
Responsibilities:
- Defined the logic for identifying visitor stitching use case scenarios and identify Visit and Sub Visit.
- Design and implement ETL solution in spark for visitor stitching.
- Involved in processing data from AWS S3 buckets and loaded into EDW tables.
- Worked on implementing optimizations techniques, persistence/caching, compression concepts which are gained through development experience.
- Responsible for developing column mapping, transformations and data profiling tasks.
- Responsible for scheduling ETL tasks in Airflow, which is an ETL task scheduler and dependency manager.
- Worked extensively on debugging and making changes ETL spark solution.
- Configured git repositories and used different sbt plugins to manage the build dependencies.
- Developed a testing framework in Scala to test different use cases in visitor stitching.
- Deploy the developed jar files in to Qubole and make sure spark environment is correctly spinning up with latest changes.
- Load the data from Parquet files in to APS tables by using scheduled jobs.
Confidential, Jersey City, NJ
Big Data Sr. Consultant
Responsibilities:
- To react to fraud in a timely manner, developed a look up the history of any user in few milliseconds with options of using HBase NoSQL databases and caches for quick retrieval of information.
- Worked on various machine-learning models and applied on data to find correlation between different entities.
- Hands on experience in various models like Linear Regression, Decision Trees, Naïve Bayes, Random Forests and K-means Clustering and worked on MLLib, GraphX libraries and spark streaming.
- Involved in loading data into HBase using HBase Shell, HBase Client API.
- Very good understanding of Spark core concepts and internals like partitions, memory management, functions of executors and drivers, shuffling, optimizations techniques, persistence/caching, compression concepts which are gained through development experience.
- Responsible for developing data pipeline using flume and Kafka to extract the data and store in HDFS.
- Used Maven extensively for building jar files of Java MapReduce programs and deployed to Cluster.
- Created Multimode projects using SBT, developed the build script, and used various sbt plugins to manage the build dependencies.
Confidential
Senior Big Data Developer
Responsibilities:
- Responsible for ingesting data in to HDFS using Sqoop from relational databases Sql Server and DB2.
- Responsible for reconciliation between current records in HIVE and new change records.
- Responsible for performing extensive data validation using Hive.
- Created Sqoop jobs, PIG and Hive scripts for data ingestion from relational databases to compare with historical data.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, Mongo DB, NoSQL and a variety of portfolios.
- Hands on experience in adjusting the memory settings for Map or Reduce and any other performance optimizations needed to make the job execute in the agreed SLA time.
- Responsible for debugging failed jobs in production.
Confidential
ETL Architect
Responsibilities:
- Utilized SQL*Loader to load flat files database tables.
- Responsible for SQL tuning and optimization using Analyze, Explain Plan, TKPROF utility and optimizer hints.
- Responsible for performing code reviews.
- Designed and created different ETL packages using SSIS from Oracle source to SQL Server as destination
- Installed upgraded and migrated databases from Oracle database to MS SQL Server 2008.
- Transformed data from various data sources using OLE DB connection by creating various SSIS packages
Confidential
Senior Database Developer and Analyst
Responsibilities:
- Responsible for analyzing business functionality for the back end processes
- SQL Server 2008 RDBMS database development using T-SQL programming, queries, stored procedures, views.
- Performed tuning and optimization on SQL queries using Explain Plan and optimizer hints.
- Created SQL reports, data extraction and data loading scripts for different databases and schemas
- Developed SSIS packages using different type’s task and with Error Handling.
- Developed reports on SSRS on SQL Server (2005).
- Used SSIS to create ETL packages to Validate, Extract, Transform and Load data to Data Warehouse and Data Mart Databases.
Confidential
ETL Developer
Responsibilities:
- Estimation, Analyzing FRD’s, Preparing Design Documents, writing Unit Test cases, Coding, Unit testing & guiding team members
- Responsible for package deployment and development
- Responsible for troubleshooting production problems Writing queries to front end designers.
- Used DTS for data population from different sources and create DTS packages for data conversions and load data from flat flies or excel files.
- Fine tune complex SQL queries and Stored Procedures to increase performance.
- Worked as a developer in creating complex Stored Procedures, DTS packages, triggers, cursors, tables, and views and other SQL joins and statements for applications.