Big Data Engineer Resume Florida - Hire IT People

PROFILE SUMMARY

Around 8 years of experience with core strengths in Healthcare Industry
Designed and implemented data ingestion techniques for batch and real time data coming from various data sources
Built predictive analytics models to generate actionable insights
Key Big Data Competencies:
Spark: Spark Core, Spark SQL, Spark Streaming
Data Collection and exploration (Scala) + Data Visualization
Hive performance tuning
Productionizing Big Data Applications
Spark Scala API’s working noledge
Batch and streaming applications noledge
Involved in preparing teh estimations and delivery plans for projects of varying complexity
Worked with different file formats like Parquet, Text files, XML, Excel, Fixed length and JSON etc.
Experience in using Kerberos for authenticating teh end users in Hadoop in secure mode.
Encryption of passwords and other sensitive keys using jceks.
Experience in writing User Defined Functions (UDF’S) using Scala.
Worked in complete Software Development Life Cycle (analysis, design, development, testing, Implementation and support) using Waterfall and Agile Methodologies
Strong team player, ability to work independently and in a team as well, excellent analytical capabilities. Ability to quickly adapt to new environments and learn new technologies.
Managed multiple tasks and worked under tight deadlines and in fast pace environment
Possess good interpersonal, analytical skills and a go - getter personality

TECHNICAL SKILL-SET

Analytical Tools: SQL, Jupyter Notebook

Programming: SCALA, Python Python - Data Manipulation, Numpy, Pandas, Matplotlib

Big Data: Spark, Hive, Sqoop, HBase, Impala, Hadoop, HDFS, Map Reduce, Shell Script Spark - Spark Core, Spark SQL, Spark Streaming, Scala, Pyspark

NoSQL: Hbase, MongoDB

Methodologies: Agile and Waterfall model

Others: AWS, Jenkins, Control-M,S3,AWS Glue, Lambda, EC2

PROFESSIONAL EXPERIENCE

Big Data Engineer

Confidential, Florida

Responsibilities:

Design and implement data ingestion techniques for data coming from various source systems
Designed a generic framework to ingest data from various healthcare vendors, apply business rules and load teh data to sailfish(IBM DB2) tables for business use
Responsible for designing and managing teh Sqoop jobs dat uploaded teh data from Oracle to HDFS, Mainframe to HDFS and Hive.
Designed and implemented Incremental imports into Hive table.
Developed Scala programs to perform data scrubbing for unstructured data.
Created partitions, bucketing across state in Hive to handle structured data.
Experience in handling large datasets using Partition, Spark in memory capabilities, Broadcasts in spark, Effective & efficient joins, Transformations and ingestion.
Worked on teh core and Spark SQL modules of Spark extensively using Scala.
Clean data as per business requirements by using user defined functions (UDF's) in Spark Scala.
Involved in converting Hive/SQL queries into Spark transformations and actions using spark SQL (Data frames and Datasets) in scala and python.
Experience in Handling Data Exceptions and writing invalid data to database (Postgres).
Implemented spark SQL queries with scala for faster testing and processing of data.
Exported teh analyzed data to teh relational databases using sqoop for visualization and to generate reports for teh BI team
Involved in Waterfall and Agile development methodology and actively participated in daily scrum meetings

Environment/Tools: Apache Hadoop, Spark SQL, HDFS, Scala, SBT, IntelliJ, Hive, Sqoop, Oracle, Mainframes, HQL, Postgres, Sailfish, Hortonworks

Confidential

Big Data Engineer

Responsibilities:

Developed Data pipeline using Spark, Hive and HBase to ingest data into Hadoop cluster for analysis
Collected data using Spark Streaming from AWS S3 bucket in batch and real time and performs necessary transformations and aggregations to build teh common learner data model and persist teh data in HDFS
Hands on experience in designing, developing and maintaining software solutions in Hadoop cluster
Exploring with teh Spark improving teh performance and optimization of teh existing algorithms in Hadoop using spark context, Spark SQL, Data Frame, Spark Yarn
Experienced with spark streaming to ingest data into an ingestion platform, an inbuilt application
Designed teh ETL runs performance tracking sheet in different phases of teh project and shared with teh production team
Performs quality check on teh existing code to improve performance.
Imported teh data from different sources like AWS S3, Local file system into Spark RDD
Involved in converting Hive/SQL queries into Spark Transformations using Spark RDDs and python
Used Hive to analyze teh partitioned and Bucketed data and compute various metrics for reporting
Involved in developing Hive DDLS to create, alter and drop Hive tables
Involved in loading data from Linux file system to HDFS
Involved in data warehousing and Business Intelligent systems
Involved in identifying and designing most efficient and cost-effective solution through research and evaluation of alternatives
Demonstrated hadoop practices and noledge of technical solutions, design patterns and code for medium/ complex applications deployed in Hadoop production

Environment/Tools: Spark, Hive, Pig, Spark SQL, Spark Streaming, HBase, Sqoop, Cloudera, Pyspark, HDFS

Confidential

Big Data Engineer

Responsibilities:

Understanding teh systems requirement and functional design.
Coding modules and follow teh design specifications and standards
Experience with post deployment and production activities
Problem correction & Testing, Monitoring Batch process
Involved in Research and Development work - ‘Next Generation Platform’
Environment clean up and batch preparation activities
Involved migrating data from DB2 to HDFS
Developed application to apply data transformations and drop teh data to MQ in json format.

Environment/Tools: Spark, Hive, Spark SQL, Cloudera, HDFS, Control-M, IBM MQ, Service Manager, Sqoop

Confidential

Hadoop/SQL Developer

Responsibilities:

Responsible for loading and transforming large sets of structured, semi structured and unstructured data
Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources
Extracted files from Mainframe DB and SQL server through Sqoop and stored in HDFS
Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream teh log data from servers.
Automated workflow using shell scripts
Involved in moving teh final results into HBase data base for transactional and activation needs
Assisted with data capacity planning and node forecasting
Developed high performing shell scripts to automate teh jobs.
Handled 2 TB of data volume and implemented teh same in Production
Worked on 20 nodes UAT Hadoop cluster for unit testing of program
Working experience in Agile/Scrum methodologies

Environment/Tools: Hadoop, HDFS, HBase, Pig, Hive, Sqoop, Oracle, Mainframes, DB2, Shell Script, SQL server

We provide IT Staff Augmentation Services!

Big Data Engineer Resume

FloridA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship