We provide IT Staff Augmentation Services!

Big Data Engineer Resume

3.00/5 (Submit Your Rating)

New, JerseY

SUMMARY:

  • A collaborative engineering professional with 12 years of IT experience in all phases of Software Development Life Cycle (SDLC) with skills in Big Data, SAP BI, data analysis, design, development, testing and deployment of software systems and executing solutions for complex business problems.6 years of experience in Big Data - Hadoop/Spark developer and has a strong background on major cluster distributions Cloudera and Hortonworks, AWS with file distribution systems and processing frameworks in a big-data arena. Involving large scale data warehousing, real-time analytics and reporting solutions.
  • Known for using the right tools when and where they make sense and creating an intuitive architecture that helps organizations effectively analyze and process terabytes of structured and unstructured data. Hands-on experience on machine learning for supervised and unsupervised ML to predict and classify business objectives.
  • Proven history of building large-scale data processing systems and serving as an expert in data warehousing solutions while working with a variety of database technologies.
  • Experience architecting highly scalable, distributed systems using different open source and SAP tools as well as designing and optimizing large, multi-terabyte data warehouses.
  • Able to integrate state-of-the-art Big Data technologies into the overall architecture and lead a team of developers through the construction, testing and implementation phase.
  • Consulted with business partners and made recommendations to improve the effectiveness of Big Data systems, descriptive analytics systems, and prescriptive analytics systems.
  • Integrated new tools and developed technology frameworks/prototypes to accelerate the data integration process and empower the deployment of predictive analytics Experience designing, reviewing, implementing and optimizing data transformation processes in the Hadoop and Spark ecosystems.
  • Able to consolidate, validate and cleanse data from a vast range of sources - from applications and databases to files and Web services.

TECHNICAL SKILLS:

Databases: Oracle, H-Base, Cassandra, Kafka.

Tools: Hadoop and YARN, HDFS, Hive, Hbase, Sqoop, Spark & Scala, Python, Oozie, Kafkaand Spark Streaming,AKKA

PROFESSIONAL EXPERIENCE:

Confidential, New Jersey

Big Data Engineer

Responsibilities:

  • Designed and developed a data pipeline to ingest high volume data into Cassandra tables using spark scala application.
  • Designed and developed Kafka/Spark Structured streaming application for streaming data inflow from various sources.
  • Cassandra data modelling with composite partition keys and clustering keys.
  • Spark Cassandra connector were used for processing C* table data and optimization.
  • Implemented Akka actor system and actors for event driven and parallel processing data pipeline.
  • Configured Zookeeper nodes and server to communicate and co-ordinate across distributed spark cluster.
  • Redis in memory database and structure(hash, set.. etc) has been used to define and read the run time parameters.

Confidential

Big Data Engineer

Responsibilities:

  • Designed and developed a data pipeline to ingest high volume data into HDFS with parquet format from multiple file system sources using Pyspark application.
  • Designed a data workflow model to create a data lake in Hadoop ecosystem so that reporting tools like Tableau can plugin to generate the necessary reports.
  • Created Source to Target Mappings (STM) for the required tables by understanding the business requirements for the reports.
  • Developed Pyspark and Spark-SQL code to process the data in Apache Spark on AWS cloud cluster to perform the necessary transformations based on the STMs developed.
  • Hive tables were created on HDFS to store the data processed by Apache Spark on the Cloudera Hadoop Cluster in Parquet format.
  • Leveraged AWS S3 as storage layer for HDFS.
  • Used GitHub/GitLab as the code repository and frequently used Git commands to clone, push, pull code to and from the Git repository
  • Hadoop Resource manager was used to monitor the jobs that were run on the Hadoop cluster
  • Used Confluence to store the design documents and the STMs
  • Meet with business and engineering teams on a regular basis to keep the requirements in sync and deliver on the requirements.
  • Designed an Impala SQL with complex query design and multiple joins to generate a table with business logic implemented and export the table to downstream systems as CSV file with specific delimiter using Pyspark application.
  • Used Jira as an agile tool to keep track of the stories that were worked on using the Agile methodology.
  • Created a Pyspark application to import data into HIVE external table with parquet format from SQL-server database tables via
  • Installed Oozie workflow engine to run multiple Shell,Hive and Spark jobs.
  • JDBC connection with transformation logics handled using UDF and Spark Data frame.
  • Handled importing of data from various data sources, performed transformations using Hive, Spark, loaded data into HDFS and Extracted the data from Netezza into HDFS using Sqoop.
  • Analyzed the data by performing Hive queries (HiveQL) to study customer behavior.
  • Used Impala to read, write and query the Hadoop data in HDFS.
  • Implemented Dynamic Partitions and Buckets in HIVE for efficient data access.
  • Valuable experience on practical implementation of cloud-specific AWS technologies including IAM, Amazon Cloud Services like Elastic Compute Cloud (EC2), Simple Storage Services (S3), Elastic Map Reduce (EMR).

Tools: Hadoop and YARN, HDFS, Hive, Hbase, Sqoop, Spark & Scala, Python,Cassandra, Oozie, Kafka and Apache Nifi, AKKA, AWS S3 and Athena

Confidential, Houston, Texas

Big Data-Hadoop/Spark Engineer

Responsibilities:

  • Created Data Lake by extracting data from various data sources into HDFS. This includes data from Oracle, SAP-systems, CSV .
  • Developed HIVE scripts for analyst requirements for analysis
  • Worked on improving the in-memory computing performance of Spark applications by optimizing the Spark core RDD transformationsbased on requirement.
  • Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
  • Importing Large Data Sets from Oracle to Hive Table using Sqoop
  • Worked on different file formats (ORCFILE, TEXTFILE, PARQUET) and different Compression Codecs (SNAPPY).
  • Developing and maintaining Workflow Scheduling Jobs in Oozie for importing data from RDBMS to Hive.
  • Gathering business requirements, working closely with the business users, project leaders and architects to translate the requirements into technical specifications.
  • Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Developed spark streaming application to receive the data streams from Kafka and process the continuous data streams and trigger actions based on fixed events.
  • Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Achieved near real time data analysis through Kafka and Spark streaming.
  • Analyzing business requirements, designing conceptual and logical data models.
  • Developed data loaders to ingest data from different sources into Big Data Lakes. That includes data acquisition, storage and transformation.
  • Utilized Spark Core, Spark Streaming and Spark SQL API for faster processing of data instead of using MapReduce in Java.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala.
  • Developed Spark programs with Scala and applied principles of functional programming to process the complex unstructured and structured data sets.

Confidential, Houston, Texas

Big Data-Hadoop/Spark Engineer

Responsibilities:Responsibilities:

  • Responsible for importing data from Oracle database to HDFS using Sqoop for further transformation.
  • Responsible for creating Hive tables on top of HDFS and developed Hive Queries to analyze the data.
  • Involved in generating the Scala spark frame work for generating the Data frames from HDFS and write the Data frames to HBASE.
  • Developed Hive tables on data using different storage format and compression techniques.
  • Optimized the data sets by creating Partitioning and Bucketing in Hive and performance tuning of Hive queries.
  • Design & Develop ETL workflow using Oozie which includes automating the extraction of data from different database into HDFS using Sqoop scripts, Transformation and Analysis in Hive,Parsing the raw data using Spark.
  • Experience in implementing efficient storage formats like Parquet and ORC.
  • Experienced in using the spark application master to monitor the spark jobs and capture the logs for the spark jobs.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Python.
  • Used Oozie workflow to co-ordinate Hive Scripts.
  • Worked with Systems Analyst and business users to understand requirements Environment: CDH, Hadoop, MapReduce, HDFS, Hive, Sqoop.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Implemented Spark using Scala and utilizing Data frames,Datasets and Spark SQL API for faster processing of data.

Tools: Cloudera, Hortonworks,Hadoop,HDFS and YARN, Hive, Hbase, Sqoop, Spark & Scala,Oozie, Kafka, Impala,SparkStreaming,Spark-sql and Hive-context

Confidential, Houston, Texas

SAP BI Consultant

Responsibilities:

  • Worked with Business Content as well as creating Info Objects, Info cubes, DSO Objects, transformation, routines.
  • Setup Flat File extractions for legacy data, Assigning Data Sources to the data targets, defining transformations. Configuring Metadata in the Administration Workbench for replication and Updating of Data Source from the Source System.
  • Involved in Creating and Maintaining Info Cubes, Data Sources, transformations, Info Packages, data transfer process.
  • Involved with PSA in Monitoring the Data Transfer from Source Systems into BI. For R/3 System extraction with delta upload using init and delta info package.
  • Created Master data Info objects for uploading attributes, texts from SAP R/3 system.
  • Performed analysis, design, development and implementation of standard/ customized Info Cubes and DSO Objects for MM and SD and FI modules.
  • Developed various data extraction procedures for master data (full and delta load) for attributes, texts and transactional data (full and delta) using
  • Developed Generic extraction for customer-defined tables using view and applied delta extraction procedures for data loading.
  • Analyzed business reporting requirements that could be satisfied by Business content info cubes of Sales.
  • Used process chains to schedule the info packages and data transfer processes.
  • Create new DSO to hold data for RU00 company code to include new fields created in ECC sap system
  • Created New Process chain with tidal job configuration for scheduling periodic, daily and monthly loads for DSO
  • Apply CKF and RKF based on new fields as characteristics
  • Created New report for analyzing Loan and Materials transfer
  • Generated new DSO and Multi provider for report to access data
  • Implemented ABAP logic in expert routine to identify sales orders return and in warranty and out of warranty status based on sales doc type.

Confidential

SAP BI Consultant

Responsibilities:

  • Gathered business requirements and deliverables along with functional analysis together with SAP R/3 and SMe’s in the business areas. Extensively involved in activating business content for standard info cubes, info objects of SD and MM modules as per the client requirements.
  • Created Transformations replacing transfer/update rules.
  • Migration of BW 3.5 data sources to BI 7.0 data sources.
  • Involved with PSA in Monitoring the Data Transfer from Source Systems into BI. For R/3 System extraction with delta upload using init and delta info package.
  • Generated reports with replacement path, adjusted exceptions for detailed reporting and designed Exceptions, Variables, Structures, Restricted Key figures, calculated key figures on BEx Queries.
  • BW security roles creation and assign users based on user teams

Confidential, Atlanta

SAP BI Consultant

Responsibilities:

  • Worked with Business Content as well as creating Info Objects, Info cubes, DSO Objects, transformation, routines.
  • Extensively involved in setting up of procedures for extracting data from SAP Extracted Logistic Data like Customer, Sales Order, Delivery and Billing from SAP R/3.
  • Setup Flat File extractions for legacy data, Assigning Data Sources to the data targets, defining transformations. Configuring Metadata in the Administration Workbench for replication and Updating of Data Source from the Source System.
  • Performed analysis, design, development and implementation of standard/ customized Info Cubes and DSO Objects for MM and SD modules.
  • Developed various data extraction procedures for master data (full and delta load) for attributes, texts and transactional data (full and delta) using
  • Developed Generic extraction for customer-defined tables using view and applied delta extraction procedures for data loading.
  • Analyzed business reporting requirements that could be satisfied by Business content info cubes of Sales.

We'd love your feedback!