We provide IT Staff Augmentation Services!

Big Data Engineer Resume

5.00/5 (Submit Your Rating)

Redmond, WA

SUMMARY

  • Over 9+ years of experience as Big Data including designing, developing using big data & Java technologies.
  • Pleasant experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.
  • Strong understanding of Hadoop daemons and Map - Reduce concepts.
  • Strong experience in importing-exporting data into HDFS format.
  • Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory.
  • Hands of experience in GCP, Big Query, GCS bucket and Stack driver.
  • Experience with on-prem (Hortonworks, MapR) and Google Cloud Platform.
  • Hands on experience with Hadoop, HDFS, Map Reduce and Hadoop Ecosystem (Pig, Hive, Oozie, Flume and HBase).
  • Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow and Oozie.
  • Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among the team.
  • Good experience transformation and storage: HDFS, Map Reduce, Spark.
  • Strong understanding and strong knowledge in NoSQL databases like HBase, MongoDB & Cassandra.
  • Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores
  • Good knowledge on Python Collections, Python Scripting and Multi-Threading.
  • Familiar with handling complex data processing jobs using Cascading.
  • Strong database skills in IBM- DB2, Oracle and Proficient in database development, including Constraints, Indexes, Views, Stored Procedures, Triggers and Cursors.
  • Extensive experience in Shell scripting.
  • Experience in component design using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
  • Expertise in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions, Hortonworks and on Amazon web services (AWS).
  • Excellent analytical and programming abilities in using technology to create flexible and maintainable solutions for complex development problems.
  • Good communication and presentation skills, willing to learn, adapt to new technologies and third party products.
  • Knowledge and working experience on big data tools like Hadoop and Azure DataLake.
  • Experience in Text Analytics, Data Mining solutions to various business problems and generating data visualizations using SAS and Python and creating dashboards using tools like Tableau.

TECHNICAL SKILLS

Hadoop Ecosystem: Hadoop, HDFS, MapReduce, Hive, Impala, Pig, Sqoop, Oozie, Zena. Zeke Scheduling, Zookeeper, Flume, PySpark, Kafka, Spark core, Spark Sql, Spark streaming, AWS, Azure Data lake

NoSQL Databases: Google big-query, HBase, Cassandra, MongoDB

Cloud: AWS, EC2, EC3, ELK, Azure, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, Azure Data Lake and Data Factory.

Build Management Tools: Maven, Apache Ant

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans

Languages: C, C++, JAVA, SQL, PL/SQL, PIG Latin, HiveQL, UNIX shell scripting

Frameworks: MVC, Spring, Hibernate, Struts 1/2, EJB, JMS, Junit, MR-Unit

Version control: GitHub, Jenkins

IDE and Tools: Eclipse 4.6, Netbeans 8.2, BlueJ

Databases: Oracle 12c/11g, Confidential SQL Server2016/2014, DB2 & MySQL 4.x/5.x

Methodologies: Software Development Lifecycle (SDLC), Waterfall, Agile, STLC (Software Testing Life cycle), UML, Design Patterns (Core Java and J2EE)

Web Technologies: HTML5/4, DHTML, AJAX, JavaScript, jQuery and CSS3/2, JSP, Bootstrap 3/3.5

PROFESSIONAL EXPERIENCE

Confidential - Redmond, WA

Big Data Engineer

Responsibilities:

  • As Big Data Engineer in Confidential involved in requirement gathering, of various click stream events for web page clicks.
  • Collaborated internal applications and tools for big data development.
  • Designed and developed scalable Azure APIs in Python and Integrated with Azure API Management, Logical Apps and other Azure services.
  • Work with SQL Azure Data Warehouse and schedule ‘copy data load’ of on-primes data.
  • Create and maintain Azure Gateways for Data and deployments, linked services and storage.
  • Prepared schema and metadata data for their internal warehouse.
  • To meet specific business requirements wrote UDF’s in Scala and Store procedures.
  • For Log analytics and for better query response used Kusto Explorer.
  • Implemented Spark streaming framework that processes the data for Kafka and perform analytics on top of it.
  • Developed Spark applications using Scala and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Build CI/CD Pipeline for code deployment to higher environment.
  • Designed the distribution strategy for tables in Azure SQL data warehouse.
  • Implemented a robust data ingestion pipeline to load real-time streaming data into HDFS using Kafka.
  • Responsible for Ingestion of Data from Blob to Kusto and maintaining the PPE and PROD pipelines.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Cosmos Activity.
  • Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.
  • Created graphical reports, tabular reports, scatter plots, geographical maps, dashboards and parameters on Power BI.
  • Simulated and predictive analysis of new peak times over a course of period
  • Used Confidential R for statistic analytics for classification & regression analysis
  • Used Jira to create and manage user stories, bug tracking, and project progress.
  • Participated in Scrum meetings, Sprint Planning, Retrospective, and Demo at the end of a sprint in agile workflow.
  • Actively reported to the higher administration with day by day reports on the advancement of the task.

Environment: Spark 3.1, Azure, Azure API, Python 3.8, CI/CD, JSON, Kusto V3, Blob, ADF, Power BI 2.91, Kafka 2.8, Confidential R, Scala 3.0, Azure Sql, Cosmos DB and Agile Methodology.

Confidential - Atlanta, GA

Big Data Engineer

Responsibilities:

  • Installed and Configured ApacheHadoop clustersfor application development andHadoop tools.
  • Installed and configured Hive and written Hive UDFs and used repository of UDF’s for Pig Latin.
  • Developed data pipeline using Pig, Sqoop to ingest cargo data and customer histories into HDFS for analysis.
  • In preprocessing phase of data extraction, Used Spark to remove all the missing data for transforming of data to create new features.
  • Installed and configured apache airflow for workflow management and created workflows in Python.
  • Implemented PySpark using Python and utilizing data frames and temporary table SQL for faster processing of data.
  • Installed Kafka on Hadoop cluster and configured producer and consumer in java to establish connection from source to HDFS with popular hash tags.
  • Imported required tables from RDBMS to HDFS using Sqoop and used PySpark RDDs to get real time streaming of data.
  • Implementing Microservices in Scala along with Apache Kafka.
  • Migrated Microservices to Google Cloud Platform.
  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies using Hadoop, HBase, Hive and Cloud Architecture.
  • Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, MapReduce, Shell Scripting, Hive.
  • Used Agile (SCRUM) methodologies for Software Development.
  • Worked on POC to check various cloud offerings including Google Cloud Platform (GCP).
  • Compared Self hosted Hadoop with respect to GCPs Data Proc, and explored Big Table (managed HBase) use cases, performance evolution.
  • Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
  • Developed data pipelines to consume data from Enterprise Data Lake (MapR Hadoop distribution - Hive tables/HDFS) for analytics solution.
  • Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
  • Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
  • Designed and develop Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Responsible in loading and transforming huge sets of structured, semi structured and unstructured data.
  • Designed Google Compute Cloud Data Flow jobs that move data within a 200 PB data lake.
  • Implemented scripts that load Google Big Query data and run queries to export data.
  • Implemented business logic by writing UDFs and configuring CRON Jobs.
  • Extensively involved in writing PL/SQL, stored procedures, functions and packages.
  • Lead architecture and design of data processing, warehousing and analytics initiatives.
  • Developed Spark scripts by using python and bash Shell commands as per the requirement.
  • Used Google SDK and BQ command line tools to extract data from Big Query tables and load into Google Cloud Storage.
  • Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
  • Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
  • Used Confidential Windows server and authenticated client server relationship via Kerberos protocol.
  • Assigned name to each of the columns using case class option in Scala.

Environment: GCP, Big Query, Apache Airflow, PySpark, Hive 2.3, Hadoop 3.0, HDFS, Oracle, Flume 1.8, Pig 0.17, Sqoop 1.4, Oozie 4.3, Python

Confidential - Phoenix, AZ

Data Engineer

Responsibilities:

  • As a Data Engineer, provided aptitude to Hadoop technologies as they relate to the development of analytics.
  • Assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in java and Scala for data cleaning and preprocessing.
  • Installed and configured Hive and written Hive UDFs and Used MapReduce and Junit for unit testing.
  • Responsible for automating build processes towardsCI/CD automationgoals.
  • Analyzed click stream data from Google analytics with Big Query.
  • Designed APIs to load data from Omniture, Google Analytics and Google Big Query.
  • Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
  • Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
  • Created Airflow Scheduling scripts in Python.
  • Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences.
  • Developed python code for different tasks, dependencies and time sensor for each job for workflow management and automation using Airflow tool.
  • Worked in a Hadoop ecosystem implementation/administration, installing software patches along with system upgrades and configuration.
  • Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
  • Performed Data transformations in Hive and used partitions, buckets for performance improvements.
  • Ingested data into HDFS using Sqoop and scheduled an incremental load to HDFS.
  • Extensively involved in writing PL/SQL, stored procedures, functions and packages.
  • Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
  • Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis.
  • Developed scripts in Pig for transforming data and extensively used event joins, filtered and done pre- aggregations.
  • Performed Data scrubbing and processing with Apache Nifi and for workflow automation and coordination.
  • Utilized sample data to create dashboard while ETL team were cleaning data from source systems, and was responsible for replacing the connection to Google Big-query later.
  • Used Sql Queries at the custom Sql level to pull the data in tableau desktop and validated the results in Tableau by running Sql queries in Sql developer and Google Big-Query.
  • Worked in developing Pig Scripts for data capture change and delta record processing between newly arrived data and already existing data in HDFS.
  • Developed custom airflow operators using Python to generate and load CSV files.
  • Developed Simple to complex streaming jobs using Python, Hive and Pig.
  • Optimized Hive queries to extract the customer information from HDFS.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
  • Analyzed data using Hive the partitioned and bucketed data and compute various metrics for reporting.
  • Developed customized classes for serialization and De-serialization in Hadoop.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Environment: GCP, Big Query, Apache Airflow, Hive 2.3, Pig 0.17, Python, HDFS, Hadoop 3.0, NOSQL, Sqoop 1.4, Oozie, Power BI, Agile, OLAP.

Confidential - Redmond, WA

Big Data/Hadoop Developer

Responsibilities:

  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
  • Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Hive.
  • Built the data pipelines that will enable faster, better, data-informed decision-making within the business.
  • Identified data within different data stores, such as tables, files, folders, and documents to create a dataset in pipeline using Azure HDInsight.
  • Performed detailed analysis of business problems and technical environments and use this data in designing the solution and maintaining data architecture.
  • Used data integration to manage data with speed and scalability using the Apache Spark engine in Azure Databricks.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Used Spark Data frames, Spark-SQL, Spark MLLib extensively.
  • Developed RDD's/Data Frames in Spark using Scala and Python and applied several transformation logics to load data from Hadoop Data Lake to Mongo DB.
  • Integrated Kafka with Spark streaming for real time data processing.
  • Used Apache Spark with both Scala and Python.
  • Closely worked with data science team in building Spark MLlib applications to build various predictive models.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Continuously monitor and manage data pipeline (CI/CD) performance alongside applications from a single console with Azure Monitor.
  • Ingested data into HDFS using Sqoop and scheduled an incremental load to HDFS.
  • Worked with Hadoop infrastructure to storage data in HDFS storage and use HIVE SQL to migrate underlying SQL codebase in Azure.
  • Uploaded streaming data from Kafka to HDFS, HBase and Hive by integrating with storm.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
  • Worked with Hadoop infrastructure to storage data in HDFS storage and use HIVE SQL to migrate underlying SQL codebase in Azure.
  • Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharding features.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Creating Hive tables and working on them using HiveQL.
  • Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
  • Developed multiple POCs using Pyspark and deployed on the YARN cluster, compared the performance of Spark, with Hive and SQL and Involved in End-to-End implementation of ETL logic.
  • Developed syllabus/Curriculum data pipelines from Syllabus/Curriculum Web Services to HBASE and Hive tables.
  • Worked on Cluster co-ordination services through Zookeeper.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
  • Exported the analyzed data to the RDBMS using Sqoop for to generate reports for the BI team.
  • Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
  • Creating the cube in Talend to create different types of aggregation in the data and also to visualize them.

Environment: Hadoop, HDFS, Spark, Azure, Scala, Zookeeper, Map Reduce, Hive, Pig, Sqoop, MongoDB, Java, Maven, UNIX Shell Scripting.

Confidential- Sunnyvale, CA

Sr. Java/Hadoop Developer

Responsibilities:

  • Participated in the design, development, and support of the corporate operation data store and enterprise data warehouse database environment.
  • Developed data platform from scratch and took part in requirement gathering and analysis phase of the project in documenting the business requirements.
  • Implemented Agile Methodology for building an internal application.
  • Developed PIG UDF'S for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
  • Developed Java MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
  • Implemented Row Level Updates and Real time analytics using CQL on Cassandra Data.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Collected and aggregated large amounts of web log data from different sources such as web servers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
  • Wrote shell scripts for Key Hadoop services like zookeeper, and also automated them to run by using CRON.
  • Developed PIG scripts for the analysis of semi structured data.
  • Worked on the Ingestion of Files into HDFS from remote systems using MFT (Managed File Transfer)
  • Used Hibernate Transaction Management, Hibernate Batch Transactions, and cache concepts.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Implemented Capacity schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
  • Designed and implemented MapReduce based large-scale parallel processing.
  • Developed and updated the web tier modules using Struts 2.1 Framework.
  • Modified the existing JSP pages using JSTL.
  • Implemented Struts Validators for automated validation.
  • Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence onto the SQl Server.
  • Performed building and deployment of EAR, WAR, JAR files on test, stage systems in Web logic Application Server.
  • Developed Java and J2EE applications using Rapid Application Development (RAD), Eclipse.
  • Used Singleton, DAO, DTO, Session Facade, MVC design Patterns.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
  • Writing complex SQL and PL/SQL queries for stored procedures.
  • Developed Reference Architecture for E-Commerce SOA Environment
  • Used UDF's to implement business logic in Hadoop
  • Custom table creation and population, custom and package index analysis and maintenance in relation to process performance.
  • Used CVS for version controlling and Junit for unit testing.

Environment: Eclipse, Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Oozie, MySQL, Cassandra, Java, Shell Scripting, MySQL, SQL.

Confidential - Gwinn, MI

Sr. Java/J2EE Developer

Responsibilities:

  • Involved in Requirement Analysis, Design, Development and Testing of the JDA Demand product.
  • Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.
  • Developed front-end screens using JSP, HTML, AJAX, JavaScript, Ext Js, JSON and CSS.
  • Involved in overall system's support and maintenance services such as Bug Fixing, Enhancements, Testing and Documentation
  • Developed persistence layer using ORM Hibernate for transparently store objects into database.
  • Responsible for coding all the JSP, Servlets used for the Used Module.
  • Developed the JSP, Servlets and various Beans using WebSphere server.
  • Wrote Java utility classes common for all of the applications.
  • Analyzed and fine Tuned RDBMS/SQL queries to improve performance of the application with the database.
  • Implemented XSLT's for transformations of the xml's in the spring web flow.
  • Developed POJO based programming model using spring framework.
  • Used IOC (Inversion of Control) Pattern and Dependency Injection of Spring framework for wiring and managing business objects.
  • Handled Java multi threading part in back-end component, one thread will be running for each user, which serves that user.
  • Used Web Services to connect to mainframe for the validation of the data.
  • WSDL has been used to expose the Web Services.
  • Participating in multiple WebEx sessions with clients/Support in the process of bug fixing.
  • Developed stored procedures, Triggers and functions to process the data using PL/SQL and mapped it to Hibernate Configuration File and also established data integrity among all tables.
  • Involved in the up gradation of WebLogic and SQL Servers.
  • Participated in Code Reviews of other modules, documents, test cases.
  • Performed unit testing using Junit and performance and volume testing.
  • Implemented UNIX Shell to deploy the application.
  • Used Oracle database for data persistence.
  • Log4j framework has been used for logging debug, info & error data.
  • Extensively worked on UNIX operating systems.
  • Used GIT as version control system.
  • Implemented the Business Services and Persistence Services to perform Business Logic.
  • Responsibilities include design for future user requirements by interacting with users, as well as new development and maintenance of the existing source code.

Environment: JDA, SDLC, JSP, HTML, AJAX, JavaScript, JSON, Backbone JS, XSLT's, xml's, spring framework, Java, Hibernate, Junit, UNIX Shell, Oracle, Log4j framework, GIT and CSS

Confidential

Java/J2EE Developer

Responsibilities:

  • Developed Interactive GUI screens using HTML, bootstrap and JSP and data validation using Java Script.
  • Responsible for designing Rich user Interface Applications using Servlets, JavaScript, CSS, HTML, XHTML and AJAX.
  • Involved in the analysis, design, and development and testing phases of Software Development Life Cycle (SDLC).
  • Developed code using Core Java to implement technical enhancement following Java Standards.
  • Implemented Hibernate utility classes, session factory methods, and different annotations to work with back end data base tables.
  • Filling the requirement gaps and communicated with Analyst to fill those gaps.
  • Established a JSON contract to make a communication between the JS pages and java classes.
  • Implemented an asynchronous, AJAX and JQuery UI components based rich client to improve customer experience.
  • Extensively used Maven to manage project dependencies and build management.
  • Developed the UI panels using Spring MVC, XHTML, CSS, JavaScript and JQuery.
  • Used Hibernate for object Relational Mapping and used JPA for annotations.
  • Integrated Hibernate with Struts using Hibernate Template and uses provided methods to implement CRUD operations.
  • Used JDBC and Hibernate for persisting data to different relational databases
  • Established Database Connectivity using JDBC, Hibernate O/R mapping with ORM for MySQL Server.
  • Involved in creating the tables using SQL and connectivity is done by JDBC.
  • Used JPA (Java Persistence API) with Hibernate as Persistence provider for Object Relational mapping.
  • Wrote various SQL queries for data retrieval using JDBC.
  • Wrote various stored procedures in PL/SQL and JDBC routines to update tables.
  • Involved in building and parsing XML documents using SAX parser.
  • Followed good coding standards with usage of Junit, Easy Mock and Check style.
  • Build/Integration tools and Deployment using Maven 2 and Jenkins.
  • Consumed Web Services to interact with other external interfaces in order to exchange the data in the form of XML and by using SOAP.
  • Involved in splitting of big Maven projects to small projects for easy maintainability.
  • Involved in deploying and testing the application in JBoss application server.

Environment: GUI, HTML, bootstrap, JavaScript, Angular JS, JSP, AJAX, Struts, Servlets, java, Hibernate, JQuery, Maven, MVC, XHTML, CSS, JPA, CRUD, JDBC

We'd love your feedback!