Big Data Engineer Resume
FloridA
PROFILE SUMMARY
- Around 8 years of experience with core strengths in Healthcare Industry
- Designed and implemented data ingestion techniques for batch and real time data coming from various data sources
- Built predictive analytics models to generate actionable insights
- Key Big Data Competencies:
- Spark: Spark Core, Spark SQL, Spark Streaming
- Data Collection and exploration (Scala) + Data Visualization
- Hive performance tuning
- Productionizing Big Data Applications
- Spark Scala API’s working noledge
- Batch and streaming applications noledge
- Involved in preparing teh estimations and delivery plans for projects of varying complexity
- Worked with different file formats like Parquet, Text files, XML, Excel, Fixed length and JSON etc.
- Experience in using Kerberos for authenticating teh end users in Hadoop in secure mode.
- Encryption of passwords and other sensitive keys using jceks.
- Experience in writing User Defined Functions (UDF’S) using Scala.
- Worked in complete Software Development Life Cycle (analysis, design, development, testing, Implementation and support) using Waterfall and Agile Methodologies
- Strong team player, ability to work independently and in a team as well, excellent analytical capabilities. Ability to quickly adapt to new environments and learn new technologies.
- Managed multiple tasks and worked under tight deadlines and in fast pace environment
- Possess good interpersonal, analytical skills and a go - getter personality
TECHNICAL SKILL-SET
Analytical Tools: SQL, Jupyter Notebook
Programming: SCALA, Python Python - Data Manipulation, Numpy, Pandas, Matplotlib
Big Data: Spark, Hive, Sqoop, HBase, Impala, Hadoop, HDFS, Map Reduce, Shell Script Spark - Spark Core, Spark SQL, Spark Streaming, Scala, Pyspark
NoSQL: Hbase, MongoDB
Methodologies: Agile and Waterfall model
Others: AWS, Jenkins, Control-M,S3,AWS Glue, Lambda, EC2
PROFESSIONAL EXPERIENCE
Big Data Engineer
Confidential, Florida
Responsibilities:
- Design and implement data ingestion techniques for data coming from various source systems
- Designed a generic framework to ingest data from various healthcare vendors, apply business rules and load teh data to sailfish(IBM DB2) tables for business use
- Responsible for designing and managing teh Sqoop jobs dat uploaded teh data from Oracle to HDFS, Mainframe to HDFS and Hive.
- Designed and implemented Incremental imports into Hive table.
- Developed Scala programs to perform data scrubbing for unstructured data.
- Created partitions, bucketing across state in Hive to handle structured data.
- Experience in handling large datasets using Partition, Spark in memory capabilities, Broadcasts in spark, Effective & efficient joins, Transformations and ingestion.
- Worked on teh core and Spark SQL modules of Spark extensively using Scala.
- Clean data as per business requirements by using user defined functions (UDF's) in Spark Scala.
- Involved in converting Hive/SQL queries into Spark transformations and actions using spark SQL (Data frames and Datasets) in scala and python.
- Experience in Handling Data Exceptions and writing invalid data to database (Postgres).
- Implemented spark SQL queries with scala for faster testing and processing of data.
- Exported teh analyzed data to teh relational databases using sqoop for visualization and to generate reports for teh BI team
- Involved in Waterfall and Agile development methodology and actively participated in daily scrum meetings
Environment/Tools: Apache Hadoop, Spark SQL, HDFS, Scala, SBT, IntelliJ, Hive, Sqoop, Oracle, Mainframes, HQL, Postgres, Sailfish, Hortonworks
Confidential
Big Data Engineer
Responsibilities:
- Developed Data pipeline using Spark, Hive and HBase to ingest data into Hadoop cluster for analysis
- Collected data using Spark Streaming from AWS S3 bucket in batch and real time and performs necessary transformations and aggregations to build teh common learner data model and persist teh data in HDFS
- Hands on experience in designing, developing and maintaining software solutions in Hadoop cluster
- Exploring with teh Spark improving teh performance and optimization of teh existing algorithms in Hadoop using spark context, Spark SQL, Data Frame, Spark Yarn
- Experienced with spark streaming to ingest data into an ingestion platform, an inbuilt application
- Designed teh ETL runs performance tracking sheet in different phases of teh project and shared with teh production team
- Performs quality check on teh existing code to improve performance.
- Imported teh data from different sources like AWS S3, Local file system into Spark RDD
- Involved in converting Hive/SQL queries into Spark Transformations using Spark RDDs and python
- Used Hive to analyze teh partitioned and Bucketed data and compute various metrics for reporting
- Involved in developing Hive DDLS to create, alter and drop Hive tables
- Involved in loading data from Linux file system to HDFS
- Involved in data warehousing and Business Intelligent systems
- Involved in identifying and designing most efficient and cost-effective solution through research and evaluation of alternatives
- Demonstrated hadoop practices and noledge of technical solutions, design patterns and code for medium/ complex applications deployed in Hadoop production
Environment/Tools: Spark, Hive, Pig, Spark SQL, Spark Streaming, HBase, Sqoop, Cloudera, Pyspark, HDFS
Confidential
Big Data Engineer
Responsibilities:
- Understanding teh systems requirement and functional design.
- Coding modules and follow teh design specifications and standards
- Experience with post deployment and production activities
- Problem correction & Testing, Monitoring Batch process
- Involved in Research and Development work - ‘Next Generation Platform’
- Environment clean up and batch preparation activities
- Involved migrating data from DB2 to HDFS
- Developed application to apply data transformations and drop teh data to MQ in json format.
Environment/Tools: Spark, Hive, Spark SQL, Cloudera, HDFS, Control-M, IBM MQ, Service Manager, Sqoop
Confidential
Hadoop/SQL Developer
Responsibilities:
- Responsible for loading and transforming large sets of structured, semi structured and unstructured data
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources
- Extracted files from Mainframe DB and SQL server through Sqoop and stored in HDFS
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream teh log data from servers.
- Automated workflow using shell scripts
- Involved in moving teh final results into HBase data base for transactional and activation needs
- Assisted with data capacity planning and node forecasting
- Developed high performing shell scripts to automate teh jobs.
- Handled 2 TB of data volume and implemented teh same in Production
- Worked on 20 nodes UAT Hadoop cluster for unit testing of program
- Working experience in Agile/Scrum methodologies
Environment/Tools: Hadoop, HDFS, HBase, Pig, Hive, Sqoop, Oracle, Mainframes, DB2, Shell Script, SQL server
