We provide IT Staff Augmentation Services!

Data Architect / Sr Big Data Developer Resume

5.00/5 (Submit Your Rating)

Charlotte, NC

OBJECTIVE:

Obtain a challenging position as a Hadoop Architect to make use of my creative abilities, analytical skills and strong knowledge of advanced technologies.

SUMMARY:

  • Over 10+ years of experience in the full life cycle of the software design process including requirements definition, prototyping, design, implementation (coding), testing, maintenance and documentation.
  • 5+ years of strong experience, working on Apache Hadoop ecosystem components like MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Oozie, GreenPlum, Teradata, Hbase, Zookeeper, Flume, Spark with CDH4&5 distributions and Amazon Elastic Compute Cloud (EC2) cloud computing with AWS.
  • 7 + years of work experience on Software Development using Agile methodology.
  • Experience in using/developing ER diagrams, SQL, Stored procedures & Triggers using RDBMS packages like SQL Server 2000/ 2005, MS Access and Oracle
  • Proficient in big data ingestion and streaming tools like Flume, Sqoop, Kafka, and Storm.
  • Experience with different data formats like Json, Avro, parquet, RC and ORC and compression like l zo .
  • Experience in using editors like Dreamweaver, Home Site, and Eclipse
  • Proficient in big data ingestion and streaming tools like Flume, Sqoop, Kafka, and Storm.
  • Experience with different data formats like Json, Avro, parquet, RC and ORC and compression's like lzo.
  • Hands on experience working on NoSQL databases including HBase and its integration with Hadoop cluster and Worked with Mainframe and Ca7 Schedule
  • Extensive experience in both MapReduce MRv1 and MapReduce MRv2 (YARN).
  • Extensive experience in HDFS, PIG, Hive, Zookeeper, UNIX and HBase.
  • Experience in collecting business requirements, writing functional requirements and test case documents.
  • Well versed in Object Oriented Programming and Software Development Life Cycle from project definition to post - deployment
  • Excellent written and verbal communication skills, Experience in Interacting with clients/users to gather the user requirements.
  • Ability to work well individually under pressure, Self-motivated fast learner, who is consistently responsible and deadline oriented.

TECHNICAL SKILLS:

RDBMS/DBMS: MS-SQL Server 2008R2/2008/2005, Oracle 8i, MS Access, Excel, Oracle, 11g/10g/9i, SQL Server, MS Access.

Programming Languages: Hive, PIG, Hadoop T-SQL, HTML, DHML, Visual Basic, AJAX, Java, SQL, and PL/SQL, SCALA .

Software/Databases:: MS SQL Server, (DTS), Oracle 8, ODBC, OLTP, OLAP, MS SQL Server 2000 Enterprise Manager, SQL Query Analyzer, Web Services, SQL Profiler, MySQL.

Operating Systems: Windows XP/Vista/7, Windows 2000, Windows 2003/2008/2012 Enterprise Server, UNIX.

Tools: Business Objects, Crystal Reports, SAS, Erwin, Data Modeler.

WORK EXPERIENCE:

Confidential, Charlotte, NC

Data Architect / Sr Big Data Developer

Responsibilities:

  • Worked extensively on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN, Spark and Map Reduce programming.
  • Analyzed data which need to be loaded into Hadoop and contacted with respective source teams to get the table information and connection details.
  • Used Sqoop to import data from different RDBMS systems like Oracle, DB2 and loaded into HDFS
  • Created Hive tables and partitioned data for better performance. Implemented Hive UDF's and did performance tuning for better results.
  • Implemented optimized map joins to get data from different sources to perform cleaning operations before applying algorithms.
  • Developed work flow in Ozzie to manage and schedule jobs on Hadoop cluster to trigger daily, weekly and monthly batch cycles.
  • Responsible for Designing Logical and Physical data modeling for various data sources on Confidential Red shift.
  • Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift.
  • Implemented POC to introduce Spark Transformations.
  • Used RDD's to perform transformation on data sets as well as to perform actions like count, reduce, first.
  • Migrated an existing on-premises application to AWS.
  • Used AWS services like Amazon Elastic Compute Cloud (EC2) and S3 for small data sets.
  • Used Cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised by applications.
  • Created analysis documents to understand table types (Truncate and load or incremental load), frequency of updates, source data base connection details etc.
  • Worked on documenting all tables created to ensure all transactions are drafted properly.
  • Analyzed data by performing Hive queries and running Pig Scripts to study transnational behavior of policies and plans.
  • Stored data in Amazon S3 and use multiple Amazon EMR clusters to process the same data set.
  • Worked with EMR clusters to efficiently and securely use Amazon S3 as an object store For Hadoop.
  • Developed shell scripts to move files (received through SFTP) from landing zone server to HDFS, update the file tracker and send mails after the execution is complete.
  • Participated in design and implementation discussion for developing Cloudera 5 Hadoop eco system and supported team when there are updates in Cloudera versions.
  • Worked in Agile development environment having KANBAN methodology. Participated in daily scrum and other design related meetings.

Technology: Hadoop, CDH, Map Reduce, Hive, Pig, Sqoop, HBase, Java, Spark, Oozie, Linux, Python, DB2, Oracle, AWS.

Confidential, Bentonville, AR

Team Lead / Sr Big Data Developer

Responsibilities:

  • Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, Map Reduce, Sqoop, Storm, Spark, Kafka, Yarn, Oozie, and Zookeeper.
  • As a Hadoop Developer/Lead responsible for designing, developing, testing, tuning and building a large-scale data processing system, for Data Ingestion and Data products that allow the Client to improve quality, velocity and monetization of enterprise level data assets for both Operational Applications and Analytical needs.
  • Creating technical documents about the project design, functionality, coding and implementation for business users and developers.
  • Designed, developed and maintained ApShrink Project to provide analysis on the data to find out how much Walmart is losing through shrinkage of inventory through various process like breakage, expiring of goods and shop lifting etc.
  • Information gathering - interacted with the end user to understand the requirement and business logic.
  • Loading the data from Mainframe VM to HDFS from HDFS to Green plum using CA7 jobs for ApShrink project.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Designed and developed effective mechanism to automate existing processes using PostgreSQL psql.
  • Worked on finding data discrepancy in the Postgres databases with respect to already loaded data.
  • Setting up the Hadoop Landing zone and security setup through karberos and created the process to load mainframe data in to Hadoop and Hive tables.
  • Working with key business stake holders to understand use case requirements for data analysis.
  • Used Scala functional programming.
  • Create HIVE partition managed tables for the each incremental loads. Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Develop generic SQOOP import utility to load data from various RDBMS sources like Tera Data, DB2 and Green Plum PostgreSQL.
  • Experience in database design and development using SQL Azure, Microsoft SQL Server and Microsoft Access.
  • Experience working with Azure SQL Database Import and Export Service.
  • Experience in deploying SQL Databases in AZURE.
  • Experience working with SQL Database Migration Wizard SQL Azure and Azure SQL Database Preview v12.
  • High level understanding of RDBMS concepts as well as Data Modeling and Azure SQL concepts.
  • Importing Walmart.com data from Tera Data to Hadoop using Tera Data Parallel transporter (TPT) and TDCH utilities.
  • Develop HIVE UDF, .hql scripts passing dynamic parameters using hivevar.
  • Writing UNIX shell scripts by using SFTP to load data from external sources to UNIX box and then load into the HDFS.
  • Knowledge of Cassandra-Spark connector to load data to and from Cassandra.
  • Write UNIX script to load data from Green plum temp schema to production schemas.
  • Analyzed the volume of the existing batch process and designed the Kafka Topic and partition.
  • Implemented Kafka Customer with Spark-streaming and Spark SQL using Scala.
  • Performed advanced procedures like text analytic and processing, using the in-memory computing capabilities of Spark using Scala.
  • Experienced in handling large data sets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Using GPLoad utility to load data from Teradata and external files stored by third party vendors to Green plum temp schema.
  • Written Big Data big query to create Big table and load big data in Green plum using Postgresql.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Worked in Postgresql to convert weekly data to daily data.
  • Exported the data to Excel and CSV files for business users to perform analytics.
  • Used Zookeeper for various types of centralized configurations.
  • Used Sqoop to import the data on to Cassandra tables from different relational databases like Oracle, MySQL.
  • Consumed the data from Kafka queue using spark.
  • Working on data validation and post deployment support.

Technology: Hadoop, Hive, Sqoop, Erwin, Zookeeper, Map Reduce, HDFS, Spark, Kafka, Scala, Green Plum, Teradata, Cassandra, UNIX, Mainframe and ca7, GIT hub, Jira.

Confidential, Rockville, Maryland

AWS /SR. Big Data Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Developed Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
  • Involved in loading data from UNIX file system to HDFS.
  • Worked on analyzing data with Hive and Pig and real time analytical operations using Hbase.
  • Create views over HBase table and used SQL queries to retrieve alerts and meta data.
  • Worked with HBASE NOSQL database.
  • Helped and directed testing team to get up to speed on Hadoop Data testing.
  • Worked on loading and transforming large sets of structured and semi structured data.
  • Implemented Map Reduce secondary sorting to get better performance for sorting results in Map Reduce programs.
  • Involved in data analysis using hive and handling the ad-hoc requests as per requirement.
  • Worked on User Defined Functions in Hive to load data from HDFS to run aggregation function on multiple rows.
  • Created different UDF’s and UDAF’s to analyze partitioned, bucketed data and compute various metrics for reporting on dashboard and stored them in different summary tables.
  • Used Ozzie Work-Flow engine to run multiple Hive and Pig jobs.
  • Created stored procedures, triggers and functions to operate on report data in MySQL.
  • Wrote back end code in Java to interact with the database using JDBC.

Technology: Hadoop, CDH, Map Reduce, Hive, Pig, Sqoop, HBase, Java, Spark, Oozie, Linux, Python, DB2, Oracle, AWS.

Confidential, Salem, OR

Program Analyst/SQL Developer

Responsibilities:

  • Involved in development of Software Development Life Cycle (SDLC) and UML diagrams like Use Case Diagrams, Class Diagrams and Sequence Diagrams to represent the detail design phase.
  • Create new tables, views, indexes and user defined functions.
  • Perform daily database backup & restoration and monitor the performance of Database Server.
  • Actively designed database to fasten certain daily jobs and stored procedures.
  • Optimized query performance by creating indexes.
  • Developed Stored Procedures, Views to be used to supply data for all reports. Complex formulas were used to show derived fields and to format data based on specific conditions.
  • Involved in Administration of SQL Server by creating users & login ids with appropriate roles & grant privileges to users and roles. Worked on authentication modules to provide controlled access to users on various modules
  • Create joins and sub-queries for complex queries involving multiple tables.
  • Developed stored procedures and triggers using PL/SQL in order to calculate and update tables to implement business logic.
  • Responsible for report generation using SQL Server Reporting Services (SSRS) and Crystal Reports based on business requirements.
  • Developed complex SQL queries to perform efficient data retrieval operations including stored procedures, triggers etc.
  • Designed and Implemented tables and indexes using SQL Server.

Technology: Eclipse, Oracle, HTML, PL/SQL, Oracle, XML, SQL.

Confidential, Gaithersburg, Maryland

Program Analyst/SQL Developer

Responsibilities:

  • Developed SQL Scripts to perform different joins, sub queries, nested querying, Insert/Update and Delete data in MS SQL database tables.
  • Experience in writing PL/SQL and in developing and implementing Stored Procedures, Packages and Triggers.
  • Experience on modeling principles, database design and programming, creating E-R diagrams and data relationships to design a database.
  • Responsible for designing advance SQL queries, procedure, cursor, triggers.
  • Build data connection to database using MS SQL Server.
  • Worked on project to extract data from XML file to SQL table and generate data file reporting using SQL Server 2008.
  • Used Tomcat web server for development purpose.
  • Involved in creation of Test Cases for Unit Testing.

Technology: PL/SQL, My SQL, SQL Server 2008(SSRS & SSIS), Visual studio 2000/2005, MS Excel.

We'd love your feedback!