We provide IT Staff Augmentation Services!

Big Data Engineer Resume

4.00/5 (Submit Your Rating)

FloridA

PROFILE SUMMARY

  • Around 8 years of experience with core strengths in Healthcare Industry
  • Designed and implemented data ingestion techniques for batch and real time data coming from various data sources
  • Built predictive analytics models to generate actionable insights
  • Key Big Data Competencies:
  • Spark: Spark Core, Spark SQL, Spark Streaming
  • Data Collection and exploration (Scala) + Data Visualization
  • Hive performance tuning
  • Productionizing Big Data Applications
  • Spark Scala API’s working noledge
  • Batch and streaming applications noledge
  • Involved in preparing teh estimations and delivery plans for projects of varying complexity
  • Worked with different file formats like Parquet, Text files, XML, Excel, Fixed length and JSON etc.
  • Experience in using Kerberos for authenticating teh end users in Hadoop in secure mode.
  • Encryption of passwords and other sensitive keys using jceks.
  • Experience in writing User Defined Functions (UDF’S) using Scala.
  • Worked in complete Software Development Life Cycle (analysis, design, development, testing, Implementation and support) using Waterfall and Agile Methodologies
  • Strong team player, ability to work independently and in a team as well, excellent analytical capabilities. Ability to quickly adapt to new environments and learn new technologies.
  • Managed multiple tasks and worked under tight deadlines and in fast pace environment
  • Possess good interpersonal, analytical skills and a go - getter personality

TECHNICAL SKILL-SET

Analytical Tools: SQL, Jupyter Notebook

Programming: SCALA, Python Python - Data Manipulation, Numpy, Pandas, Matplotlib

Big Data: Spark, Hive, Sqoop, HBase, Impala, Hadoop, HDFS, Map Reduce, Shell Script Spark - Spark Core, Spark SQL, Spark Streaming, Scala, Pyspark

NoSQL: Hbase, MongoDB

Methodologies: Agile and Waterfall model

Others: AWS, Jenkins, Control-M,S3,AWS Glue, Lambda, EC2

PROFESSIONAL EXPERIENCE

Big Data Engineer

Confidential, Florida

Responsibilities:

  • Design and implement data ingestion techniques for data coming from various source systems
  • Designed a generic framework to ingest data from various healthcare vendors, apply business rules and load teh data to sailfish(IBM DB2) tables for business use
  • Responsible for designing and managing teh Sqoop jobs dat uploaded teh data from Oracle to HDFS, Mainframe to HDFS and Hive.
  • Designed and implemented Incremental imports into Hive table.
  • Developed Scala programs to perform data scrubbing for unstructured data.
  • Created partitions, bucketing across state in Hive to handle structured data.
  • Experience in handling large datasets using Partition, Spark in memory capabilities, Broadcasts in spark, Effective & efficient joins, Transformations and ingestion.
  • Worked on teh core and Spark SQL modules of Spark extensively using Scala.
  • Clean data as per business requirements by using user defined functions (UDF's) in Spark Scala.
  • Involved in converting Hive/SQL queries into Spark transformations and actions using spark SQL (Data frames and Datasets) in scala and python.
  • Experience in Handling Data Exceptions and writing invalid data to database (Postgres).
  • Implemented spark SQL queries with scala for faster testing and processing of data.
  • Exported teh analyzed data to teh relational databases using sqoop for visualization and to generate reports for teh BI team
  • Involved in Waterfall and Agile development methodology and actively participated in daily scrum meetings

Environment/Tools: Apache Hadoop, Spark SQL, HDFS, Scala, SBT, IntelliJ, Hive, Sqoop, Oracle, Mainframes, HQL, Postgres, Sailfish, Hortonworks

Confidential

Big Data Engineer

Responsibilities:

  • Developed Data pipeline using Spark, Hive and HBase to ingest data into Hadoop cluster for analysis
  • Collected data using Spark Streaming from AWS S3 bucket in batch and real time and performs necessary transformations and aggregations to build teh common learner data model and persist teh data in HDFS
  • Hands on experience in designing, developing and maintaining software solutions in Hadoop cluster
  • Exploring with teh Spark improving teh performance and optimization of teh existing algorithms in Hadoop using spark context, Spark SQL, Data Frame, Spark Yarn
  • Experienced with spark streaming to ingest data into an ingestion platform, an inbuilt application
  • Designed teh ETL runs performance tracking sheet in different phases of teh project and shared with teh production team
  • Performs quality check on teh existing code to improve performance.
  • Imported teh data from different sources like AWS S3, Local file system into Spark RDD
  • Involved in converting Hive/SQL queries into Spark Transformations using Spark RDDs and python
  • Used Hive to analyze teh partitioned and Bucketed data and compute various metrics for reporting
  • Involved in developing Hive DDLS to create, alter and drop Hive tables
  • Involved in loading data from Linux file system to HDFS
  • Involved in data warehousing and Business Intelligent systems
  • Involved in identifying and designing most efficient and cost-effective solution through research and evaluation of alternatives
  • Demonstrated hadoop practices and noledge of technical solutions, design patterns and code for medium/ complex applications deployed in Hadoop production

Environment/Tools: Spark, Hive, Pig, Spark SQL, Spark Streaming, HBase, Sqoop, Cloudera, Pyspark, HDFS

Confidential

Big Data Engineer

Responsibilities:

  • Understanding teh systems requirement and functional design.
  • Coding modules and follow teh design specifications and standards
  • Experience with post deployment and production activities
  • Problem correction & Testing, Monitoring Batch process
  • Involved in Research and Development work - ‘Next Generation Platform’
  • Environment clean up and batch preparation activities
  • Involved migrating data from DB2 to HDFS
  • Developed application to apply data transformations and drop teh data to MQ in json format.

Environment/Tools: Spark, Hive, Spark SQL, Cloudera, HDFS, Control-M, IBM MQ, Service Manager, Sqoop

Confidential

Hadoop/SQL Developer

Responsibilities:

  • Responsible for loading and transforming large sets of structured, semi structured and unstructured data
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources
  • Extracted files from Mainframe DB and SQL server through Sqoop and stored in HDFS
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream teh log data from servers.
  • Automated workflow using shell scripts
  • Involved in moving teh final results into HBase data base for transactional and activation needs
  • Assisted with data capacity planning and node forecasting
  • Developed high performing shell scripts to automate teh jobs.
  • Handled 2 TB of data volume and implemented teh same in Production
  • Worked on 20 nodes UAT Hadoop cluster for unit testing of program
  • Working experience in Agile/Scrum methodologies

Environment/Tools: Hadoop, HDFS, HBase, Pig, Hive, Sqoop, Oracle, Mainframes, DB2, Shell Script, SQL server

We'd love your feedback!