Data Lake Engineer Resume Framingham MA - Hire IT People

SUMMARY

Overall 10 years of IT industry experience in product Development, Implementation and Maintenance of various applications using Big Data ecosystems on Linux environment
Overall 6 years of experience working with analytics using Big Data technologies. Have hands - on experience in Storing, Querying, Processing and Data Analysis
Comprehensive work experience in implementing Big Data projects using Apache Hadoop, Pig, Hive, HBase, Spark, Sqoop, Flume, Zookeeper, Oozie
Experience with distributed systems, large-scale non-relational data stores and multi-terabyte data warehouses
Excellent knowledge on Hadoop architecture : Hadoop Distributed File system (HDFS), Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
Hands-on experience building data pipelines using Hadoop components Sqoop, Hive, Pig, MapReduce, Spark, Spark SQL
Hands on experience in various Big Data application phases like Data Ingestion, Data Analytics and Data Visualization
Experience in developing efficient solutions to analyze large data sets
Experience working on Hortonworks / Cloudera / MapR distributions
Extensively worked on MRV1 and MRV2 Hadoop architectures
Experience working on Spark, RDD’s, DAG’s, Spark SQL and Spark Streaming
Experience in importing and exporting data using Sqoop between HDFS and Relational Database Management Systems
Populated HDFS with huge amounts of data using Apache Kafka and Flume
Excellent knowledge of data mapping, extracting, transforming and loading from different data sources
Worked with different File Formats like TEXTFILE, SEQUENCE FILE, AVROFILE, ORC, and PARQUET for Hive querying and processing
Experience in developing custom MapReduce Programs in Java using Apache Hadoop for analyzing Big Data as per the requirement; writing Python automation scripts for applications
Well experienced in data transformation using custom MapReduce, Hive and Pig scripts for different types of file formats
Expertise in extending Hive and Pig core functionality by writing custom UDFs and UDAF’s
Designing and creating Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets
Experience building solutions with NoSQL databases, such as HBase, Cassandra, MongoDB
Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala
Experience in Kafka installation & integration with Spark Streaming
Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing
Experience in designing both time driven and data driven automated workflows using Oozie
Good understanding of ZooKeeper for monitoring and managing Hadoop jobs
Good understanding of ETL tools and how they can be applied in a Big Data environment
Monitoring Map Reduce Jobs and YARN Applications
Experience working with Microsoft Azure Cloud services: Azure Data Lake storage Gen1 & Gen2, Azure Data Factory & other services
Experience working with Databricks notebooks & integration of the notebooks with Azure Data Factory
Hands -on experience with Amazon Elastic MapReduce (EMR), Storage S3, EC2 instances and Data Warehousing
Experience with RDBMS and writing SQL and PL/SQL scripts used in stored procedures
Used Git for source code and version control management
Strong understanding in Agile and Waterfall SDLC methodologies
Experience in working with small and large groups and successful in meeting new technical challenges and finding solutions to meet the needs of the customer

TECHNICAL SKILLS

Big Data Technologies: HDFS, YARN, Map Reduce, Pig, Hive, HBase, Spark, Spark SQL, Spark Streaming, Sqoop, Flume, Kafka, ZooKeeper, Oozie

Big Data Distributions: Hortonworks, Cloudera, MapR, Amazon Elastic MapReduce (EMR)

Programming Languages: Java, Python, Scala, C++, R, JavaScript, Shell Script

Operating Systems: Linux, Windows, Unix

RDBMS: Oracle, MySQL, MS SQL Server

NoSQL Databases: HBase, Cassandra, MongoDB

Frame works: Spring, Hibernate, Struts

Web Servers: Apache Tomcat, Web Sphere, Web Logic

Version Control: Git, SVN, CVS

Integrated Development Environments (IDEs): Spyder, Java Eclipse IDE, NetBeans, Microsoft SQL Studio

Web Technologies: HTML, CSS, Bootstrap, Java Script, DOM, XML, Servlets

PROFESSIONAL EXPERIENCE

Confidential, Framingham MA

Data Lake Engineer

Responsibilities:

Worked in developing data lake for the GBT (Global Business Transactions) reporting team
Worked in developing hierarchy application for the ECH (Enterprise Customer Hierarchy) team
Worked in developing a unified data platform for the SVC (Single View Customer) team
Involved in complete project life cycle starting from design discussion to production deployment
Worked closely with the business team to gather their requirements
Assisted in designing & developing data lake and ETL using python and Hadoop ecosystem
Coordinated with clients’ developers in tuning up the query performance for all services
Involved with developing queries in MySQL, Oracle & DB2
Worked with Hadoop’s components: HDFS, MapReduce, Hive, Sqoop, Hue, Kafka for Couchbase NoSQL data extract
Worked with Microsoft Azure cloud services for migrating on-premises data from RDBMS sources (PostgreSQL) and FTP servers (cloud-based) to the Azure Data Lake Storage Gen1 & Gen2
Worked with Azure Databricks notebooks, for computing - using spark RDDs and spark SQL processing, integrating them as part of Azure Data Factory’s pipelines
Tested the code performance in development and Quality Assurance environments
Responsible in supporting the client after production release
Followed Agile Methodologies while working on the project

Environment: Hadoop, Spark, HDFS, MapReduce, Yarn, Hive, Hue, Sqoop, Kafka, SQL, GitHub, Python scripts, Linux, Tidal Scheduler, Microsoft Azure - Data lake storages (Gen1 & Gen2), Databricks notebooks, Spark RDDs, Spark SQL, Oracle, MySQL, Postgres & DB2 relational databases, Couchbase NoSQL database

Confidential, Chicago IL

Sr. Hadoop Developer

Responsibilities:

Involved in complete project life cycle starting from design discussion to production deployment
Worked closely with the business team to gather their requirements and new support features
Involved in running POC’ s on different use cases of the application and maintained a standard document for best coding practices
Developed a 16-node cluster in designing the Data Lake with the Hortonworks distribution
Responsible for building scalable distributed data solutions using Hadoop
Installed, configured and implemented high availability Hadoop Clusters with required services (HDFS, Hive, HBase, Spark, ZooKeeper)
Implemented Kerberos for authenticating all the services in Hadoop Cluster
Configured ZooKeeper to coordinate the servers in clusters to maintain the data consistency
Involved in designing the Data pipeline from end-to-end, to ingest data into the Data Lake
Wrote scripts to automate application deployments and configurations monitoring YARN
Configured and developed Sqoop scripts to migrate the data from relational databases like Oracle, Teradata to HDFS
Used Flume for collecting and aggregating large amounts of streaming data into HDFS
Wrote MapReduce jobs in Java to parse the raw data populate staging tables and store the refined data
Developed Map Reduce programs as a part of predictive analytical model development
Built re-usable Hive UDF libraries for business requirements which enabled various business analysts to use these UDF’s in Hive querying
Created different staging tables like ingestion tables and preparation tables in Hive environment
Optimized Hive queries and used Hive on top of Spark engine
Worked on Sequence files, Map side joins, Bucketing, Static and Dynamic Partitioning for Hive performance enhancement and storage improvement
Tested Apache TEZ , an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL, Scala
Worked on the Spark core and Spark SQL modules of Spark extensively
Created tables in HBase to store the variable data formats of data coming from different upstream sources
Leveraged AWS cloud services such as EC2; auto-scaling; and VPC (Virtual Private Cloud) to build secure, highly scalable and flexible systems that handled expected and unexpected load bursts and can quickly evolve during development iterations
Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Spark framework using Scala
Configured various workflows to run on top of Hadoop using Oozie and these workflows comprises of heterogeneous jobs like Hive, Sqoop and MapReduce
Experience in managing and reviewing Hadoop log files
Utilized capabilities of Tableau such as Data extracts, Data blending, Forecasting, Dashboard actions and table calculations to build dashboards
Followed Agile Methodologies while working on the project
Performed bug fixing and 24X7 production support for running the processes

Environment: Java, Scala, Hadoop, Hortonworks, AWS, HDFS, YARN, Map Reduce, Hive, Spark, Kafka, Sqoop, Oozie, Zookeeper, Oracle, Teradata, MySQL

Confidential, Washington DC

Hadoop Developer

Responsibilities:

Experience with complete SDLC process staging code reviews, source code management and build process
Implemented Big Data platforms as data storage, retrieval and processing systems
Developed data pipeline using Kafka, Sqoop, Hive and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis
Involved in managing nodes on Hadoop cluster and monitor Hadoop cluster job performance using Cloudera manager
Wrote Sqoop scripts for importing and exporting data into HDFS and Hive
Wrote MapReduce jobs to discover trends in data usage by the users
Load and transform large sets of structured, semi structured and unstructured data Pig
Experienced working on Pig to do transformations, event joins, filtering and some pre-aggregations before storing the data onto HDFS
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
Involved in developing Hive UDF’s for the needed functionality that is not available out of the box from Hive
Created Sub-Queries for filtering and faster execution of data
Experienced in migrating Hive QL into Impala to minimize query response time
Used HCATALOG to access the Hive table metadata from MapReduce and Pig scripts
Experience in writing and tuning Impala queries, creating views for ad-hoc and business processing
Experience loading and transforming large amounts of structured and unstructured data into HBase and exposure handling Automatic failover in HBase
Ran POC's in Spark to take the benchmarking of the implementation
Developed Spark jobs using Scala in test environment for faster data processing and querying
Worked on migrating MapReduce programs into Spark transformations using Spark and Scala
Configured big data workflows to run on the top of Hadoop using Oozie and these workflows comprises of heterogeneous jobs like Pig, Hive, Sqoop Cluster co-ordination services through ZooKeeper
Hands on experience in Tableau for Data Visualization and analysis on large data sets, drawing various conclusions
Involved in developing test framework for data profiling and validation using interactive queries and collected all the test results into audit tables for comparing the results over the period
Documented all the requirements, code and implementation methodologies for reviewing and analyzation purposes
Extensively used GitHub as a code repository and Phabricator for managing day to day development process and to keep track of the issues

Environment: Java, Scala, Hadoop, Spark, HDFS, MapReduce, Yarn, Hive, Pig, Impala, Oozie, Sqoop, Flume, Kafka, Teradata, SQL, GitHub, Phabricator, Amazon Web Services

Confidentia

Hadoop Developer

Responsibilities:

Worked on Hortonworks cluster, which is responsible for providing open source platform based on Apache Hadoop for analyzing, storing and managing big data
Worked with analyst to determine and understand business requirements
Load and transform large datasets of structured, semi structured and unstructured data using Hadoop/Big Data concepts
Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer data and financial histories into HDFS for analysis
Used MapReduce and Flume to load, aggregate, store and analyze web log data from different web servers
Created MapReduce programs to handle semi/unstructured data like XML, JSON, AVRO data files and sequence files for log files
Involved in submitting and tracking MapReduce jobs using Job Tracker
Experience writing Pig Latin scripts for Data Cleansing, ETL operations and query optimizations of exists scripts
Written Hive UDF to sort Structure fields and return complex data types
Created Hive tables from JSON data using data serialization framework like AVRO
Experience writing reusable custom Hive and Pig UDF’s in Java and using existing UDF’s from Piggybank and other sources
Experience in working with NoSQL database HBase in getting real time data analytics
Integrated Hive tables to HBase to perform row level analytics
Developed Oozie workflows for daily incremental loads, which Sqoop’s data from Teradata, Netezza and then imported into Hive tables
Involved in performance tuning by using different service engines like TEZ etc.
Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files
Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS using AutoSys and Oozie coordinator jobs
Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library

Environment: Hortonworks, Java, Hadoop, HDFS, MapReduce, Tez, Hive, Pig, Oozie, Sqoop, Flume, Teradata, Netezza, Tableau

Confidential

Hadoop Developer

Responsibilities:

Installed Cloudera distribution of Hadoop Cluster and services HDFS, Pig, Hive, Sqoop, Flume and MapReduce
Responsible for providing open source platform based on Apache Hadoop for analyzing, storing and managing big data
Loaded and transformed large sets of structured, semi-structured and unstructured data
Responsible for managing data coming from different sources
Imported and exported data into HDFS and Hive using Sqoop
Wrote Hive queries
Involved in loading data from UNIX file system to HDFS
Created Hive tables, loaded with data and wrote queries which will run internally in MapReduce and performed data analysis as per the business requirements
Worked with analysts to determine and understand business requirements
Loaded and transformed large datasets of structured, semi structured and unstructured data using Hadoop/Big Data concepts
Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer data and financial histories into HDFS for analysis
Used MapReduce and Flume to load, aggregate, store and analyze web log data from different web servers
Created MapReduce programs to handle semi/unstructured data like XML, JSON, AVRO data files and sequence files for log files
Involved in submitting and tracking MapReduce jobs using Job Tracker
Experience writing Pig Latin scripts for Data Cleansing, ETL operations and query optimizations of exists scripts
Written Hive UDF to sort Structure fields and return complex data types
Created Hive tables from JSON data using data serialization framework like AVRO
Experience writing reusable custom Hive and Pig UDF’s in Java and using existing UDF’s from Piggybank and other sources
Experience in working with NoSQL database HBase in getting real time data analytics
Integrated Hive tables to HBase to perform row level analytics
Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files
Developed Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library
Supported operations team in Hadoop cluster maintenance including commissioning and decommissioning nodes and upgrades
Provided technical assistance to all development projects
Hands-on experience with Qlik Sense for Data Visualization and Analysis on large data sets, drawing various insights
Created dashboards using Qlik Sense and performed Data extracts, Data blending, Forecasting, and table calculations

Environment: Hortonworks, Java, Hadoop, HDFS, MapReduce, Hive, Pig, Oozie, Sqoop, Flume, Netezza, Qlik Sense

Confidential

Java Developer

Responsibilities:

Built the application based on Rational Unified Process (RUP)
Analyzed and developed UML’s with Rational Rose including development of class diagrams, sequence diagrams, use case diagrams and activity diagrams
Implemented the Middle-Tier employing design patterns like MVC, Business Delegate, Service Locator, Session Façade, Data Access Objects (DAO’s)
Developed using MVC architecture and employed the Struts Framework and used Validator Framework and Tiles Framework as a plug-in with struts
Developed user interface using JSP, JSP Tag libraries (JSTL) and Struts Tag Libraries
Used EJB’s in the application and developed Session beans to house business login at the middle tier level
Used Java Message Service (JMS) for reliable and asynchronous exchange of important information
Used Hibernate in data access layer to access and update the information in database
Implemented various XML technologies like XML schemas, JAXB parsers for cross platform data transfer
Used JSON to pass objects between web pages and server-side application
Used XSL-FO to generate PDF reports
Extensively worked on XML parsers (SAX/DOM)
Used WSDL and SOAP protocol for Web Services implementation
Used JDBC to access DB2 UDB database for accessing customer information
Developed application level logging using Log4J
Used CVS for version controlling and Junit for unit testing
Involved in development of Tables, Indices, Stored procedures, Database Triggers and Functions
Involved in documenting the application

Environment: J2EE 1.7, WebSphere Application Server v8.0, RAD, JSP 2.0, EJB 3.1, Struts 2.0, JMS, JSON, JDBC, JNDI, XML, XSL, XSLT, XSL-FO, WSDL, SOAP, Hibernate 4.0, RUP, Rational Rose (2000), Log4J, Junit, CVS, IBM DB2 v8.2, Red Hat LINUX, RESTful web services

We provide IT Staff Augmentation Services!

Data Lake Engineer Resume

Framingham, MA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship