Sr. Big Data Developer Resume
Merrimack, NH
SUMMARY
- Over 10 years of experience as a Sr. Big Data Developer with skills in analysis, design, development, testing and deploying various software applications.
- Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 Web Services which provides fast and efficient processing of Teradata Big Data Analytics.
- Experience in collection of Log Data and JSON data into HDFS using Flume and processed the data using Hive/Pig.
- Extensive experience on developing Spark Streaming jobs by developing RDD's (Resilient Distributed Datasets) and used Spark SQL as required.
- Experience on developing Java MapReduce jobs for data cleaning and data manipulation as required for the business.
- Strong knowledge on Hadoop eco - systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
- Proficient knowledge in Designing and implementing data structures and commonly used data business intelligence tools for data analysis.
- Extensive experience in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
- Excellent working with data modeling tools like Erwin, Power Designer and ER Studio.
- Proficient working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
- Strong experience in Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export.
- Excellent technical and analytical skills with clear understanding of design goals and development for OLTP and dimension modeling for OLAP.
- Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- Good experience in developing MapReduce jobs in J2EE /Java for data cleansing, transformations, pre-processing and analysis.
- Excellent experience in installing and running various Oozie workflows and automating parallel job executions.
- Experience on Spark and SparkSQL, Spark Streaming, Spark GraphX, Spark MLlib.
- Extensively development experience in different IDE like Eclipse, NetBeans, and IntelliJ.
- Extensive experience with advanced J2EE Frameworks such as spring, Struts, JSF and Hibernate.
- Expertise in JavaScript, JavaScript MVC patterns, Object Oriented JavaScript Design Patterns and AJAX calls.
- Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR.
- Strong knowledge in NOSQL column oriented databases like HBase and its integration with Hadoop Cluster.
- Good knowledge of coding using SQL, SQL Plus, T-SQL, PL/SQL, Stored Procedures/Functions.
- Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
- Also have experience in understanding of existing systems, maintenance and production support, on technologies such as Java, J2EE and various databases (Oracle, SQL Server).
- Experience working with Hortonworks and Cloudera environments.
- Good knowledge in implementing various data processing techniques using Apache HBase for handling the data and formatting it as required.
- Experience in developing custom UDFs for Pig and Hive to incorporate methods and functionality of Python/Java into Pig Latin and HQL (HiveQL).
- Experience with Oozie Scheduler in setting up workflow jobs with MapReduce and Pig jobs.
- Knowledge of architecture and functionality of NOSQL DB like HBase, Cassandra and MongoDB.
- Good Understanding on querying datasets, Filtering the data, Aggregations, Joining the disparate datasets and produce ranked or sorted data using Spark RDD, Spark DF, Spark SQL, Hive, Impala.
- Good at writing custom RDD's in Scala and also implemented design patterns to improve the performance.
- Experience in analyzing large volume of data using Hive Query Language and also assisted with performance tuning.
- Experience using middleware architecture using Sun Java technologies like J2EE, JSP, and Servlets.
TECHNICAL SKILLS
Big Data Ecosystem: MapReduce, HDFS, HIVE 2.3, HBase 1.2 Pig, Sqoop, Flume 1.8, HDP, Oozie, Zookeeper, Spark, Kafka, storm, Hue Hadoop Distributions Cloudera (CDH3, CDH4, CDH5), Hortonworks
Cloud Services: Amazon AWS, EC2, Redshift, MS Azure
Relational Databases: Oracle 12c, MySQL, MS-SQL Server2016
NoSQL Databases: HBase, Hive 2.3, and MongoDB
Version Control: GIT, GitLab, SVN
Java/J2EE Technologies: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC, JAX-WS
Programming Languages: Java, Python, SQL, PL/SQL, AWS, HiveQL, UNIX Shell Scripting, Scala.
Software Development & Testing Life cycle: UML, Design Patterns (Core Java and J2EE), Software Development Lifecycle (SDLC), Waterfall Model and Agile, STLC
Web Technologies: JavaScript, CSS, HTML and JSP.
Operating Systems: Windows, UNIX/Linux and Mac OS.
Build Management Tools: Maven, Ant.
IDE & Command line tools: Eclipse, IntelliJ, Toad and NetBeans.
PROFESSIONAL EXPERIENCE
Confidential, Merrimack, NH
Sr. Big Data Developer
Responsibilities:
- As a Sr. Big Data Developer worked on Hadoop eco-systems including Hive, MongoDB, Zookeeper, Spark Streaming with MapR distribution.
- Developed Big Data solutions focused on pattern matching and predictive modeling.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
- Implemented Security in Web Applications using Azure and deployed Web Applications to Azure.
- Involved in writing Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
- Load the data into Spark RDD and Perform in-memory data computation to generate the output as per the requirements.
- Used Kibana, which is an open source based browser analytics and search dashboard for Elastic Search.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Used Java Persistence API (JPA) framework for object relational mapping which is based on POJO Classes.
- Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
- Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS which consists of key attributes consumed by Business and other down streams.
- Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
- Created Hive tables, and loading and analyzing data using hive queries.
- Wrote Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
- Used Hadoop YARN to perform analytics on data in Hive.
- Developed and maintained batch data flow using HiveQL and Unix scripting
- Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS using Scala.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Developed SQL scripts using Spark for handling different data sets and verifying the performance over MapReduce jobs.
- Used Java Persistence API (JPA) framework for object relational mapping which is based on POJO Classes.
- Responsible for fetching real time data using Kafka and processing using Spark and Scala.
- Worked on Kafka to import real time weblogs and ingested the data to Spark Streaming.
- Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
- Maintained Hadoop, Hadoop ecosystems, and database with updates/upgrades, performance tuning and monitoring.
- Developed customized Hive UDFs and UDAFs in Java, JDBC connectivity with hive development and execution of Pig scripts and Pig UDF’s.
- Used Oozie and Zookeeper operational services for coordinating cluster and scheduling workflows.
- Involved in converting MapReduce programs into Spark transformations using Spark RDD's using Scala and Python.
- Developed and execute data pipeline testing processes and validate business rules and policies
- Built code for real time data ingestion using Java, MapR-Streams (Kafka) and STORM.
- Extensively used JQuery to provide dynamic User Interface and for the client side validations.
- Responsible for defining the data flow within Hadoop eco-system and direct the team in implement them.
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and Spark.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
- Involved in running Hadoop jobs for processing millions of records of text data.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries.
Environment: Agile, Hadoop 3.0, MS Azure, MapReduce, Java, MongoDB 4.0.2, HBase 1.2, JSON, Scala 2.12, Oozie 4.3, Zookeeper 3.4, J2EE, Python 3.7, JQuery, NoSQL, MVC, Struts 2.5.17, Hive 2.3
Confidential - Mt Laurel, NJ
Sr. Big Data Engineer
Responsibilities:
- As a Big Data Engineer developed Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
- Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, Shell Scripting, Hive.
- Worked with Microsoft Azure Cloud services, Storage Accounts, Azure date storage and Azure Data Factory.
- Used Agile Methodology of Data Warehouse development using Kanbanize.
- Exported event weblogs to HDFS by creating a HDFS sink which directly deposits the weblogs in HDFS.
- Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
- Integrated Oozie with Pig, Hive, and Sqoop and developed Oozie workflow for scheduling and orchestrating the ETL process within the Cloudera Hadoop.
- Collaborated with other data modeling team members to ensure design consistency and integrity.
- Worked with Sqoop in Importing and exporting data from different databases like MySql, Oracle into HDFS and Hive.
- Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
- Worked on a POC to perform sentiment analysis of twitter data using spark-streaming.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Developed complete end to end Big-data processing in Hadoop eco system.
- Developed customized classes for serialization and Deserialization in Hadoop.
- Worked closely with the SSIS, SSRS Developers to explain the complex data transformation using Logic.
- Worked closely with business analyst for requirement gathering and translating into technical documentation.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Worked in MongoDB and UNIX environment to non-SQL data clean-up grouping and create the analysis reports.
- Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
- Created external tables pointing to HBase to access table with huge number of columns.
- Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
- Worked on Cassandra for retrieving data from Cassandra clusters to run queries.
- Extensively used Erwin for developing data model using star schema methodologies.
- Maintained MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases.
- Developed in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
- Developed, planed and migrated servers, relational databases (SQL) and websites to Microsoft Azure.
- Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL databases such as HBase and Cassandra.
- Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
- Generated multiple ad-hoc Python tools and scripts to facilitate map generation and data manipulation.
Environment: Big Data, Hadoop 3.0, Agile, Hive 2.3, HDFS, Oracle 12c, HBase 1.2, Flume 1.8, Pig 0.17, Oozie 4.3, SSIS, SSRS, SQL, PL/SQL, Cassandra 3.11, MongoDB, ETL, Sqoop
Confidential, Hillsboro, OR
Sr. Hadoop Developer
Responsibilities:
- Extensively worked on Hadoop eco-systems including Hive, Spark Streaming with MapR distribution.
- Implemented J2EE Design Patterns like DAO, Singleton, and Factory.
- Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
- Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
- Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
- Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
- Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
- Worked on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting.
- Developed the Java/J2EE based multi-threaded application, which is built on top of the struts framework.
- Used Spring/MVC framework to enable the interactions between JSP/View layer and implemented different design patterns with J2EE and XML technology.
- Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
- Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
- Implemented the J2EE design patterns Data Access Object (DAO), Session Façade and Business Delegate.
- Developed Nifi flows dealing with various kinds of data formats such as XML, JSON and Avro.
- Implemented MapReduce jobs in HIVE by querying the available data.
- Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Used Cloudera Manager for installation and management of Hadoop Cluster.
- Collaborated with business users/product owners/developers to contribute to the analysis of functional requirements.
- Implemented application using MVC architecture integrating Hibernate and spring frameworks.
- Utilized various JavaScript and JQuery libraries Bootstrap, Ajax for form validation and other interactive features.
- Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
- Integrated Kafka-Spark streaming for high efficiency throughput and reliability
- Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts.
Environment: Hadoop 3.0, Hive 2.1, J2EE, JDBC, Pig 0.16, HBase 1.1, Sqoop, NoSQL, Impala, Java, Spring, MVC, XML, Spark 1.9, PL/SQL, HDFS, JSON, Hibernate, Bootstrap, JQuery, JavaScript, Ajax
Confidential - Hartford, CT
Data Analyst/Data Engineer
Responsibilities:
- Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig
- Designed the HBase schemes based on the requirements and HBase data migration and validation
- Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Connected to Amazon Redshift through Tableau to extract live data for real time analysis.
- Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Imported the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
- Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
- Worked on moving all log files generated from various sources to HDFS for further processing
- Wrote Hive with Scala scripts to analyze data according to business requirement.
- Generate metadata, create Talend jobs, mappings to load data warehouse, Data Lake.
- Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Translated business requirements into working logical and physical data models for OLTP & OLAP systems.
- Optimized the performance of queries with modification in T-SQL queries, established joins and created clustered indexes.
- Created HBase tables to store various data formats of data coming from different sources.
- Created the system for single truth of source on Hadoop file system (HDFS), while enabling transparent data movement and access at various layers.
- Written Hive queries to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Developed SAS macros for data cleaning, reporting and to support routing processing.
- Embed SQL queries in Excel and used Excel functions to calculate parameters like standard deviation,
- Performed Data Analysis, Statistical Analysis, Generated Reports and Listing using SAS/SQL, SAS/ACCESS and SAS/EXCEL, Pivot Tables and Graphs
Environment: Erwin 9.5, SAS, SQL, HBase, Scala, T-SQL, AWS, Oozie, Hive 1.9, HDFS, PL/SQL, Excel.