We provide IT Staff Augmentation Services!

Sr. Big Data Engineer/developer Resume

3.00/5 (Submit Your Rating)

Minneapolis, MN

SUMMARY:

  • Over 9+ years of professional experience in field of IT with expertise in Enterprise Application Development including 5+ years in Big Data analytics and Hadoop Ecosystem encompassing a wide range of applications.
  • Excellent hands on experience with Hadoop ecosystem components like Hadoop, Map Reduce, Impala, HDFS, Hive, Pig, Hbase, MongoDB, Cassandra, Flume, Storm, Sqoop, Oozie, Kafka, Spark,Scala, and ZooKeeper.
  • Excellent understanding and hands on experience using NOSQL databases like Cassandra, Mongo DB and Hbase.
  • Experienced in Object Oriented Analysis Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java Design Patterns.
  • Very good hands on experience in advanced Big - Data technologies like Spark Ecosystem (Spark SQL, MLlib, SparkR and Spark Streaming), Kafka and Predictive analytics (MLlib, R ML packages including Oxdata’s ML library H2O).
  • Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, Map Reduce, Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper, Hadoop architecture and its components.
  • Experienced with Hadoop and QA to develop test plans, test scripts and test environments and to understand and resolve defects.
  • Experienced with cloud: Hadoop-on-Azure, AWS/EMR, Cloudera Manager (also direct-Hadoop-EC2 (non EMR)).
  • Experienced in analyzing data using HIVEQL, PIG Latin and custom MapReduce programs in JAVA. Extending HIVE and PIG core functionality by using custom UDF's.
  • Experienced in analyzing data using HIVEQL, PIG Latin and custom MapReduce programs in Java.
  • Experienced in Database development, ETL and Reporting tools using SQLServer, SQL, SSIS, SSRS, CrystalXI&SAPBO.
  • Excellent knowledge in Hadoop Architecture and its major components like Hadoop Map Reduce, HDFS Frame work, HIVE, PIG, HBase, Zookeeper, Sqoop, Flume, Apache Tika, Weblech and Tableau.
  • Experienced in J2EE, JDBC, Servlets, Struts, Hibernate, Ajax, JavaScript, JQuery, CSS, XML and HTML.
  • Experienced in using IDEs like Eclipse, VisualStudio and experience in DBMS like SQLServer and MYSQL.
  • Excellent experience in importing and exporting data using Sqoop from HDFS to RelationalDatabaseSystems and vice-versa.
  • Strong experience working on design and implemented a Cassandra based database and related web services for storing unstructured data.
  • Good knowledge in Unified Modeling Language (UML),Object Oriented Analysis and Design and Agile (SCRUM) Methodologies.
  • Experienced in optimization of Mapreducesalgorithm using combiners and partitioners to deliver the best results.
  • My expertise includes Team Management, providing Solutions covering various disciplines in technology and process, translating business needs into technical requirements that support the organization’s business objectives and successfully managing the phases of IT projects starting from Architecture, Requirements Gathering, Onsite/Offshore Coordination, and Design Specification of the business functionality.

TECHNICAL SKILLS:

Big data/Hadoop: HDFS, Map Reduce, Hive, Pig, Sqoop, Oozie, Hue, Flume, Kafka and Spark, Strom, Scala, Hortonworks, and Cloudera, Python, Impala, Storm, Apache Nifi.

NoSQL Databases: HBase, MongoDB, Cassandra

Java/J2EE Technologies: Java, J2EE, Servlets, spring, JSP, JDBC, XML, AJAX, REST, Java beans, JNDI

Programming Languages: C, C++, JAVA, SQL, PL/SQL, PIG Latin, HiveQL, Unix shell scripting, Scala.

Methodologies: Agile, UML, Design Patterns (Core Java and J2EE)

SQL Based Databases: Oracle, MySQL, SQL Server

Web/ Application Servers: Apache Tomcat, JBoss, IBM Web sphere, Web Logic

Web Technologies: HTML5, CSS3, XML, JavaScript, JQuery, AJAX, WSDL, SOAP

Tools: and IDE: Eclipse, NetBeans, Maven, DB Visualizer, Visual Studio 2008, SQL Server Management Studio.

PROFESSIONAL EXPERIENCE:

Confidential, Minneapolis MN

Sr. Big Data Engineer/Developer

Responsibilities:

  • Involved in the Big Data requirements review meetings and partnered with business analysts to clarify any specific scenarios and involved in daily meetings to discuss the development/progress and was active in making meetings more productive.
  • Worked with Hadoop Ecosystem components like HBase, Sqoop, ZooKeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
  • Developed PIG and Hive UDF's in java for extended use of PIG and Hive and wrote PigScripts for sorting, joining, filtering and grouping the data.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Used AWS Data Pipeline to schedule an Amazon EMR cluster to clean and process web server logs stored in Amazon S3 bucket.
  • Written Python, Shell scripts for various deployments and automation process and written MapReduce programs in Python with the Hadoop streaming API.
  • Created Hive tables, loaded data and wrote Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Tested Apache(TM), an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
  • Involved in Cluster coordination services through Zookeeper and Adding new nodes to an existing cluster.
  • Worked on importing data from multiple data sources to Google docs to S3/AWS, then to Data Lake.
  • Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources and issued SQL queries via Impala to process the data stored in HDFS and HBase.
  • Involved in developing Impala scripts for extraction, transformation, loading of data in to data warehouse.
  • Exported the analyzed data to the databases such as Teradata, MySQL and Oracle using Sqoop for visualization and to generate reports for the BI team.
  • Developed ETL workflow which pushes web server logs to an Amazon S3 bucket.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoopcluster and developed Simple to complex Map/reduce streaming jobs using Java language that are implemented using Hive and Pig.
  • Built a scalable, cost effective, and fault tolerant data warehouse system on Amazon Web Services (AWS) Cloud.
  • Developed multiple POCs using PySpark and deployed on the YARN cluster, compared the performance of Spark, with Hive and SQL and Involved in End-to-End implementation of ETL logic.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Created a Hive aggregator to update the Hive table after running the data profiling job and implemented Partitioning, Dynamic Partitioning and Bucketing in Hive.
  • Used Spark with Yarn and got performance results compared with MapReduce and used Cassandra to store the analyzed and processed data for scalability.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the HDFS and to run multiple Hive and Pig jobs.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, PySpark and Scala.
  • Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis and managed and reviewed Hadoop log files.
  • Prepare Maintenance Manual, System Description Document and other technical and function documents to help offshore team.

Environment: Big Data, Hadoop, MapReduce, Flume, Impala, Python, HDFS, HBase, Hive, Pig, Sqoop, Oozie, Zookeeper, Cassandra, Teradata, MySQL, Oracle, Scala, Spark, Scala, JAVA, UNIX Shell Scripting, AWS Glue, AWS S3, AWS EMR and Apache Nifi

Confidential, Chicago IL

Sr. Big Data Engineer/Developer

Responsibilities:

  • Involved the design, development of various modules in Hadoop Big Data Platform and processing data using MapReduce, Hive, Pig, Scoop and Oozie.
  • Developed the technical strategy of using Apache Spark on Apache Mesos as a next generation, BigData and "Fast Data" (Streaming) platform.
  • Wrote the Spark code in Scala to connect to Hbase and read/write data to the HBase table.
  • Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
  • Copied the data from HDFS to MONGODB using pig/Hive/Map reduce scripts and visualized the streaming processed data in Tableau dashboard.
  • Successfully loaded files to Hive and HDFS from Oracle, Netezza and SQL Server using SQOOP
  • Designed and developed automation test scripts using Python and analyzed the SQL scripts and designed the solution to implement using Pyspark
  • Extracted data from different databases and to copy into HDFS using Sqoop and have an expertise in using compression techniques to optimize the data storage.
  • Developed Simple to complex Map/reduce Jobs using Java programming language that are implemented using Hive and Pig.
  • Implemented POC to migrate map reduce jobs into Spark RDD transformations using Scala.
  • Implemented ETL code to load data from multiple sources into HDFS using pigscripts and implemented Flume, Spark framework for real time data processing.
  • Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework and worked on Amazon AWS concepts like KINESIS, LAMBDA, EMR and EC2 web services for fast and efficient processing of Big Data.
  • Continuously monitored and managed the Hadoop Cluster using Cloudera Manager.
  • Developed programs in Spark based on the application for faster data processing than standard MapReduce programs.
  • Develop wrapper and utility automation scripts in Python and developed the ETL jobs to load the data into a data warehouse, which is coming from various data sources like Mainframes, flat file.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms and created the Spark Streaming code to take the source files as input.
  • Developed spark programs using Scala, Involved in creating Spark SQL Queries and Developed Oozie workflow for spark jobs
  • Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
  • Built analytics for structured and unstructured data and managing large data ingestion by using Avro, Flume, Thrift, Kafka and Sqoop.
  • Worked on scalable distributed computing systems, software architecture, data structures and algorithms using Hadoop, Apache Spark and Apache Storm etc. and ingested streaming data into Hadoop using Spark, Storm Framework and Scala.
  • Developed a Data flow to pull the data from the REST API using Apache Nifi with context configuration enabled and developed entire spark applications in python (PySpark) on distributed environment.
  • Installed KAFKA on Hadoop cluster and configured producer and consumer coding part in java to establish connection from twitter source to HDFS.
  • Exported the patterns analyzed back to Teradata using Sqoop and organizing daily scrum call for status update with offshore by using Rally and Agile Craft and creating monthly status report for client.

Environment: Hadoop, Map Reducer, Cloudera Manager, Python, HDFS, Hive, Pig, Spark, Storm, Flume, Thrift, Kafka, Sqoop, Oozie, Impala, SQL, Scala, Saprk, Teradata, Scala, Java (JDK 1.6), Hadoop (Cloudera), Tableau, Eclipse and Informatica.

Confidential

Sr. Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce.
  • Supported Hbase Architecture Design with the Hadoop Architect team to develop a Database Design in HDFS.
  • Used open source web scraping framework for python to crawl and extract data from web pages and d eveloped Spark applications using Scala for easy Hadoop transitions.
  • Imported data from mainframe dataset to HDFS using Sqoop. Also handled importing of data from various data sources (i.e. Oracle, DB2, Cassandra, and MongoDB) to Hadoop, performed transformations using Hive, MapReduce.
  • Ingested huge amount of XML files into Hadoop by Utilizing DOM Parsers with in Map Reduce. Extracted Daily Sales, Hourly Sales and Product Mix of the items sold in stores and loaded them into Global Data Warehouse.
  • Wrote Pig Latin scripts and also developed UDFs for Pig Data Analysis and Wrote Hiveq ueries for data analysis to meet the business requirements.
  • Worked with Zookeeper, Oozie, and Data Pipeline Operational Services for coordinating the cluster and scheduling workflows.
  • Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Used python sub-process module to perform UNIX shell commands and extracted data from Agent Nodes into HDFS using Python scripts
  • Developed Scripts and Batch Job to schedule various Hadoop Program and involved in managing and reviewing Hadoop log files.
  • Extracted the needed data from the server into HDFS and Bulk Loaded the cleaned data into Hbase.
  • Involved in importing and exporting the data from RDBMS to HDFS and vice versa using Sqoop.
  • Utilized Agile Scrum Methodology to help manage and organize with developers and regular code review sessions.
  • Upgraded the Hadoop Cluster from CDH4 to CDH5 and setup High availability Cluster to Integrate the HIVE with existing applications
  • Developed Spark Code using python for faster processing of data
  • Extracted meaningful data from unstructured data on Hadoop Ecosystem and developed Hive queries to process the data and generate the data cubes for visualizing.
  • Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS)
  • Loaded the aggregated data onto Oracle from Hadoop environment using Sqoop for reporting on the dashboard.
  • Used Hive to analyze data ingested into Hbase by using Hive-Hbase integration and compute various metrics for reporting on the dashboard.
  • Involved in loading data from Unix File System into HDFS with different format of data (Avro, Parquet) and creating indexes and tuning the SQL queries in Hive and Involved in database connection by using Sqoop

Environment: Hadoop, Java, MapReduce, HDFS, Hive, Pig, Linux, XML, Eclipse, Cloudera, CDH4/5 Distribution, DB2, SQL Server, Oracle 11g, MySQL, Spark, Teradata, SQL, PL/SQL

Confidential

Sr. Java/Hadoop Developer

Responsibilities:

  • Installed and maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase and Sqoop.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Developed and implemented an Asynchronous, AJAX based rich client for improved customer experience.
  • Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
  • Implemented DAO classes using Hibernate framework for the data connectivity and extraction of the data according to the business logic with Oracle database.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS and wrote MapReduce jobs using Java API.
  • Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence onto the SQL server.
  • Used spring IOC for creating the beans to be injected at the run time and used jQuery script for client side JavaScript methods.
  • Developed complex Hive Scripts for processing the data and created dynamic partitions and bucketing in hive to improve the query performance.
  • Designed and developed re-usable web services and Java Utility classes to support XML, DOM, XML Schemas, and XSL.
  • Developed MapReduce applications using Hadoop Map-Reduce programming framework for processing and used compression techniques to optimize MapReduce Jobs.
  • Created HBase tables from Hive and Wrote HiveQL statements to access HBase table's data and developed Spark programs using Scala for processing data in a faster way.
  • Developed Pig UDF's to know the customer behavior and Pig Latin scripts for processing the data inHadoop.
  • Used Struts tag libraries and custom tag libraries extensively while coding JSP pages.
  • Scheduled automated tasks with Oozie for loading data into HDFS through Sqoop and pre-processing the data with Pig andHive.
  • Built a custom cross-platform architecture using Java, Spring Core/MVC, Hibernate through Eclipse IDE
  • Involved in writing PL/SQL for the stored procedures.
  • Designed UI screens using JSP, Struts tags, HTML, jQuery and used JavaScript for client side validation.
  • Always used the best practices of Java/J2EE and minimize the unnecessary object creation, encourage proper garbage collections of un-used objects, always keep try to minimize the database call, always encourage to get all data in bulk from database to get best performance of application.

Environment: Hadoop, MapReduce, HDFS, Hive, Pig, HBase, Java, Cloudera Linux, XML, MySQL, MySQL Workbench, Java 6, Eclipse, Cassandra, Oracle, Teradata, Netezza, PL/SQL

Confidential

Java/J2EE Developer

Responsibilities:

  • Developed Entity Java Beans (EJB) classes to implement various business functionalities (session beans).
  • Developed various end users screens using JSF, Servlet technologies and UI technologies like HTML, CSS and JavaScript.
  • Performed necessary validations of each screen developed by using AngularJS and JQuery and configured spring configuration file to make use of Dispatcher Servlet provided by SpringIOC.
  • Separated secondary functionality from primary functionality using Spring AOP and Developed a Stored Procedures for regular cleaning of database and prepared test cases and provided support to QA team in UAT.
  • Consumed WebService for transferring data between different applications using RESTful APIs along with JerseyAPI and JAX-RS.
  • Built the application using TDD (Test Driven Development) approach and involved in different phases of testing like UnitTesting and responsible for fixing bugs based on the test results.
  • Involved in SQL statements, stored procedures, handled SQL Injections and persisted data using Hibernate Sessions, Transactions and Session Factory Objects.
  • Responsible for Hibernate Configuration and integrated Hibernate framework.
  • Analyzed and fixed the bugs reported in QTP and effectively delivered the bug fixes reported with a quick turnaround time.
  • Extensively used Java Collections API like Lists, Sets and Maps and use PVCS for version control and deploy the application in JBOSS server.
  • Used Jenkins to deploy the application in testing environment and involved in Unit testing of the application using JUnit and implemented Log4j to maintain system log.
  • Developed the Presentation layer, which was built using Servlets and JSP and MVC and Used Maven for building, deploying application and creating JPA based entity objects.
  • Used Spring Repository to load data from MongoDB database to implement DAO layer.

Environment: Java, JDK, EJB, JSF, Servlets, Html, CSS, JavaScript, Hibernate, Struts, JQuery, Spring IOC & AOP, MongoDB, Maven, REST, Jersey, JAX-RS, JBOSS, PVCS, JPA, Java Collections, Jenkins, JUnit, QA, QTP, Log4J, JMS, JNDI, SharePoint, RAD, JMS API.

We'd love your feedback!