Sr. Big Data Engineer Resume
Charlotte, NC
SUMMARY
- Over 9+ years of experience in various IT related technologies, which includes hands - on experience in Big Data & Java technologies.
- Expertise wif the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper.
- Rich working experience in data loading in hive tables and writing hive queries using join, order by, group by etc., by Sqoop data from RDBMS.
- Experience in designing and developing applications in Spark using Scala to compare the performance of Spark wif Hive and SQL/Oracle.
- Experience in manipulating/analyzing large datasets and finding patterns and insights wifin structured and unstructured data.
- Strong experience on Hadoop distributions like Cloudera, MapR and Hortonworks.
- Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB.
- Apache Spark concepts wif Scala, writing transformations in Scala for live streaming data. Click stream analysis using Spark wif Scala involving data gathering from Kafka, Flume
- Experienced in writing complex MapReduce programs dat work wif different file formats like Text, Sequence, Xml, Apache parquet and Avro.
- Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions wif control flows.
- Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa according to client's requirement.
- Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
- Written Scala codes for data analytics in Spark using MapReduce, ByKey, group ByKey etc. to analyze the real time streaming data.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Excellent Java development skills using Java, Servlets, JSP, EJB, JDBC, SOAP and Restful web services.
- Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle.
- Experienced in working wif Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
- Strong experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
- Excellent implementation noledge of Enterprise/Web/Client Server using Java, J2EE.
- Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.
- Experience in using various IDEs Eclipse, IntelliJ and repositories SVN and Git.
- Experience of using build tools Ant, Maven.
- Strong noledge of Spark for handling large data processing in streaming process along wif Scala.
- Experience in designing a component using UML Design, Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
TECHNICAL SKILLS
Big Data Technology: Hadoop, HDFS, Scala, MapReduce, Hive, Sqoop, Flume, Oozie, Spark, Kafka, Storm and Zookeeper.
Languages: PL/SQL, Python
Cloud Architecture: Amazon AWS, EC2, EC3, Elastic Search, Elastic Load Balancing & Azure Data lake, Data factory, Azure Databricks, Azure SQL database, Azure SQL Data warehouse, GCP, BigQueryReporting Tools: Tableau, Oracle Reports, Ad-hoc, Power BI
Languages: C, Java, Python, PL/SQL, Hive QL, Unix shell scripts
Frameworks: MVC, Angular, Spring, Hibernate
NoSQL Databases: HBase, Cassandra, MongoDB
Operating Systems: Mac OS, Linux and Windows 10
Web Technologies: HTML, DHTML, XML, AJAX, JavaScript, JSP, JQuery
Databases: Oracle, SQL Server, MySQL
Tools: and IDE: Eclipse, NetBeans
Version control: SVN, CVS, GIT
Web Services: REST, SOAP, CI/CD
PROFESSIONAL EXPERIENCE
Confidential, Charlotte, NC
Sr. Big Data Engineer
Responsibilities:
- As a Big Data Engineer, assisted in leading the plan, building, and running states wifin the Enterprise Analytics Team.
- Lead architecture and design of data processing, warehousing and analytics initiatives.
- Involved in Requirement gathering, business Analysis, Design and Development, testing and implementation of business rules.
- Data governance - ensures the availability and integrity of data.
- Involved in migration of existing Spark jobs on to Azure Databricks.
- Designed and developed ETL pipeline in Azure cloud which gets customer data from API and process it to Azure SQL DB.
- Worked on on-prem Data warehouse migration to Azure Synapse using ADF.
- Created pipelines, migrate the data from on-prem resources through the Data Lake and load the data into the Azure SQL Data warehouse.
- Engaged in solving and supporting real business issues wif your Hadoop distributed File systems and Open Source framework noledge.
- Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
- Acted as a lead resource and build the entire Hadoop platform from scratch.
- Worked closely wif DevOps team to understand, design and develop end to end flow requirements by utilizing Oozie workflow to do Hadoop jobs.
- Responsible for data governance rules and standards to maintain the consistency of the business element names in the different data layers.
- Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
- Performed detailed analysis of business problems and technical environments and use dis data in designing the solution and maintaining data architecture.
- Built the data pipelines dat will enable faster, better, data-informed decision-making wifin the business.
- Worked in a Hadoop ecosystem implementation/administration, installing software patches along wif system upgrades and configuration.
- Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS.
- Applied Spark streaming for real time data transforming.
- Built Azure Data Warehouse Table Data sets for Power BI Reports.
- Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
- Performed Data transformations in Hive and used partitions, buckets for performance improvements.
- Ingested data into HDFS using Sqoop and scheduled an incremental load to HDFS.
- Worked wif Hadoop infrastructure to storage data in HDFS storage and use HIVE SQL to migrate underlying SQL codebase in Azure.
- Extensively involved in writing PL/SQL, stored procedures, functions and packages.
- Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis.
- Performed Data scrubbing and processing wif Apache NiFI and for workflow automation and coordination.
- Worked in developing Pig Scripts for data capture change and delta record processing between newly arrived data and already existing data in HDFS.
- Developed Simple to complex streaming jobs using Python, Hive and Azure Data Factory.
- Optimized Hive queries to extract the customer information from HDFS.
- Analyzed data using Hive the partitioned and bucketed data and compute various metrics for reporting.
- Worked on BI reporting wif At Scale OLAP for Big Data.
- Worked on MongoDB for distributed storage and processing.
- Used MongoDB to store processed products and commodities data, which can be further down streamed into web application.
- Developed Python scripts to do file validations in Databricks and automated the process using ADF.
- Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) dat process the data.
- Developed customized classes for serialization and De-serialization in Hadoop.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Environment: Hadoop 3.3, Hive 2.3, Python, HDFS, Azure, NOSQL, Mongo DB, Sqoop 1.4, Oozie, Power BI, Agile, OLAP.
Confidential, Newark, NJ
Big Data Engineer
Responsibilities:
- Worked as Data Engineer to review business requirement and compose source to target data mapping documents
- Involved in agile environment using a CI/CD model methodology
- The data is ingested into dis application by using Hive.
- The feedbacks are retrieved using Sqoop.
- Performed Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export through Python.
- Built Data Pipeline using GCP.
- Performed checks on the clinical data in accordance wif study specific guidelines to ensure its validity.
- Designed and architect various layer of Data Lake.
- Designed star schema in BigQuery.
- Using g-cloud function wif Python to load Data in to BigQuery for on arrival CSV files in GCS bucket.
- Write a program to download a SQL Dump from the equipment maintenance site and then load it in GCS bucket.
- Created Hive queries dat halped market analysts spot emerging trends by comparing fresh data wif EDW reference tables and historical metrics.
- Developed Spark job wif partitioned RDD (like hash, range, custom) for faster processing.
- Developing under scrum methodology and in a CI/CD environment using Jenkins.
- Worked on creating POC for utilizing the ML models and Cloud ML for table Quality Analysis for the batch process.
- Defined Oozie Job flows.
- Loaded log data directly into HDFS using Flume.
- Conducted statistical analysis on Healthcare data using Python.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
- Followed standard Back up policies to make sure the high availability of cluster.
- Created action filters, parameters and calculated sets for preparing dashboards and worksheets using Power BI.
- Developed visualizations and dashboards using Power BI.
- Created dashboards for analyzing POS data using Power BI.
- Followed agile methodology for the entire project.
- Tested raw data and executed performance scripts and Assisted wif data capacity planning and node forecasting.
- Worked wif JIRA to follow agile approach.
Environment: Hadoop 3.3, Spark 3.1, Hive, EDW, Python 2.8, GCP, Power BI, BigQuery, CI/CD, Scala 3.0, Oozie, HDFS, ML and Agile Methodology.
Confidential, Bellevue, WA
Hadoop Engineer
Responsibilities:
- Worked in a migration project to replace business Data warehousing System to Hadoop.
- Worked on a 40 nodes Hadoop Hortonworks Data Platform running HDP 2.1
- Worked wif highly structured and semi structured data sets of 45 TB in size (135 TB wif replication factor of 3).
- Responsible for building scalable distribution data solutions using Hadoop.
- Worked on Hortonworks-HDP distribution of Hadoop.
- Used AWS glue catalog wif crawler to get the data from S3 and perform Sql query operations.
- Used AWS Glue for the data transformation, validate and data cleansing.
- Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift.
- Used JSON schema to define table and column mapping from S3 data to Redshift.
- Wrote various data normalization jobs for new data ingested into Redshift.
- Defined and deployed monitoring, metrics, and logging systems on AWS.
- Designed and developed an entire module called CDC (change data capture) in python and deployed in AWS GLUE using PySpark library and python.
- Developed various transformation logics using Spark SQL and Hive as part of the migration project.
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Hadoop, Spark, Hive and Oozie.
- Integrated Elastic Search and implemented dynamic faceted-search.
- Developed Oozie workflows to source the legacy data into Hadoop and to transform the data as per the downstream specifications.
- Configured Flume to capture the news from various sources for testing the classifier.
- Developed MapReduce jobs using various Input and output formats.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing, analyzing and training the classifier using MapReduce jobs and Hive jobs.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Involved in loading data into Cassandra NoSQL Database.
- Developed Spark applications to move data into Cassandra tables from various sources like RDBMS or Hive.
- Worked on Spark streaming collects the data from Kafka in near real time and performs necessary transformations and aggregations on the fly to build the common learner data model and persists the data in Cassandra.
- Developed Python scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Worked wif various data formats such as CSV, text file and Avro.
- Performed benchmarking on sqooping the data and performance tuning to adhere to SLA's.
- Developed Data Quality framework to check for data discrepancies using shell scripting and hive.
Environment: Hadoop, Spark, Python, Spark SQL, Hive, Kafka, Cassandra, Oozie, Flume, Sqoop and SDLC process.
Confidential, Shelton, CT
Java Engineer
Responsibilities:
- Evaluated business requirements and prepared detailed specifications dat follow project guidelines required to develop written programs.
- Followed agile software development wif Scrum methodology.
- Migrated on-premises application to AWS.
- Used AWS services like EC2 and S3 for small data sets processing and storage.
- Responsible for building scalable distributed data solutions using Hadoop.
- Loaded data from MySQL, a relational database to HDFS on regular basis using Sqoop Import/Export.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
- Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Scala.
- Designed and developed Microservices business components using Spring Boot wif Spring MVC flow.
- Developed Restful Services using JAX-RS wif Spring Boot and Microservice Architecture to enable Microservices to deploy in the cloud.
- Developed and implemented spring and J2EE based MVC (Model-View-Controller)framework for the application
- Implemented application level persistence using Hibernate and spring.
- Worked on developing the application involving Spring MVC implementations and Restful web services.
- Responsible for designing Rich user Interface Applications using JavaScript, CSS, HTML, XHTML and AJAX.
- Developed the spring AOP programming to configure logging for the application
- Developed code using Core Java to implement technical enhancement following Java Standards.
- Worked wif Swing and RCP using Oracle ADF to develop a search application which is a migration project.
- Implemented Hibernate utility classes, session factory methods, and different annotations to work wif back end data base tables.
- Implemented Ajax calls using JSF-Ajax integration and implemented cross-domain calls using JQuery Ajax methods.
- Implemented Object-relational mapping in the persistence layer using Hibernate framework in conjunction wif spring functionality.
- Used JPA (Java Persistence API) wif Hibernate as Persistence provider for Object Relational mapping.
- Used JDBC and Hibernate for persisting data to different relational databases.
- Used XML and JSON for transferring/retrieving data between different Applications.
- Also wrote some complex PL/SQL queries using Joins, Stored Procedures, Functions, Triggers, Cursors, and Indexes in Data Access Layer.
- Implementing Restful web services architecture for Client-server interaction and implemented respective POJOs for its implementations
- Designed and developed SOAP Web Services using CXF framework for communicating application services wif different application and developed web services interceptors.
- Implemented the project using JAX-WS based Web Services using WSDL, UDDI, and SOAP to communicate wif other systems.
- Involved in writing application level code to interact wif APIs, Web Services using AJAX, JSON and XML.
- Wrote JUnit test cases for all the classes. Worked wif Quality Assurance team in tracking and fixing bugs.
- Used Log4j to capture the log dat includes runtime exception and for logging info.
- Used ANT as build tool and developed build file for compiling the code of creating WAR files.
- Used Tortoise SVN for Source Control and Version Management.
- Responsibilities include design for future user requirements by interacting wif users, as well as new development and maintenance of the existing source code.
Environment: JDK 1.5, Servlets, JSP, XML, JSF, Web Services (JAX-WS: WSDL, SOAP), Spring MVC, JNDI, Hibernate 3.6, JDBC, SQL, PL/SQL, HTML, DHTML, JavaScript, Ajax, Oracle 10g, SOAP, SVN, SQL, Log4j, ANT.