We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

5.00/5 (Submit Your Rating)

Phoenix, AZ

SUMMARY:

  • Over 10+ years of experience in IT industry, played major role in implementing, developing and maintenance of various Web Based applications using Java and Big Data Ecosystem.
  • Around 6 years of strong end to end experience in Hadoop, Spark, Cloud development using different Big Data tools.
  • Implemented in Analysis, Design, Testing and Deployment of web - based, distributed and enterprise applications.
  • Expertise in utilizing stream process technologies like Apache Storm, AWS Kinesis.
  • Experience in using messaging queues like Apache Kafka.
  • Extensive use of cloud computing infrastructure such as Amazon Web Services(AWS), Azure.
  • Expertise in importing and exporting data into HDFS and hive using Sqoop and vice versa.
  • Profound experience in Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce and Spark programming paradigm.
  • Experience in writing Map Reduce programs in Java.
  • Extensive usage of Ambari for managing the nodes and troubleshooting them.
  • Experience in optimizing Hive Queries by tuning configuration parameters.
  • Involved in designing the data model in Hive for migrating the ETL process into Hadoop and wrote PIG Scripts to load data into Hadoop Environment.
  • Hands on experience in NOSQL databases like Vertica, DynamoDB, Cassandra.
  • Experience in developing near real time workflows using spark streaming.
  • Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper .
  • Experienced in using messaging queues like Kafka, Event Hub, and Service Bus.
  • Extensive knowledge of Extraction , Transformation and Loading of data with different formats like text files, Avro , Paraquet , ORC and file compression codecs like gzip , lz4 and snappy .
  • Good Knowledge in Java, Python, Linux, SQL Developer.
  • Extensive knowledge on Data ingestion, data processing, Batch analytics.
  • Experience in indexing data using Lucence libraries in Apache Solr.
  • Experience in visualizing data using ELK (Elastic Search, Logstash, Kibana) stack and visualizing them in Kibana and Grafana.
  • Implemented Microservices architecture with RESTFUL APIs .
  • Skills using BOTO3 Packages (Python) to interact with AWS Services like S3 etc.
  • Hands-on experience using AWS EMR (HIVE/Spark/Zeppelin and other services).
  • Experience or knowledge of AWS RedShift.
  • Experience migrating data infrastructure and workflows into the cloud.
  • Expertise in developing test cases for Unit testing using frameworks like Junit , Mockito .
  • Understanding of tools for source management like Git, Perforce continuous integration (CI) tools like Jenkins.
  • Extensive usage of build tools like ANT, Maven.
  • Extensive experience in working on projects with Waterfall and Agile methodologies such as Test-Driven Development (TDD) and SCRUM.
  • Excellent interpersonal, analytical, verbal and written communications skills.

TECHNICAL SKILLS:

Big Data Ecosystems: Hadoop, Map Reduce, HDFS, Apache Kafka, Apache Storm, Ambari, AWS Kinesis, Zookeeper, Apache Hive, Pig, Sqoop, Oozie, Flume, Yarn, Spark.

DB Languages: SQL, PL/SQL, Oracle

Programming Languages: Java, C, and Scala.

Frameworks: : Spring.

Scripting Languages: : Java/J2EE, JavaScript, Python, Shell Scripting.

Web Services: : Restful

Databases: : RDBMS, HBase, Cassandra, MongoDB, Vertica, AWS Redshift, AWS Dynamo DB.

Tools: Eclipse, Net Beans, Atom, Jupyter, PyCharm.

Testing Frameworks: : Junit, Mockito.

Source Control: Git, Bitbucket, Perforce, GitLab

Platforms: Windows, Linux, Unix

Application Servers: Apache Tomcat.

Methodologies: Agile, Waterfall

PROFESSIONAL EXPERIENCE:

Confidential, Phoenix, AZ

Sr. Big Data Engineer

Responsibilities:

  • Analyze and define researcher’s strategy and determine system architecture and requirement to achieve goals
  • Formulate strategic plans for component development to sustain future project objectives.
  • Used various spark transformations like map, reducebykey, filter to clean the input data.
  • Involved in writing custom Map-Reduce programs using java API for data processing.
  • Integrated Maven build and designed workflows to automate the build and deploy process.
  • Involved in developing a linear regression model to predict a continuous measurement of an observation of particular patient this is developed using spark with scala API.
  • Worked extensively on spark and MLlib to develop a regression model for cancer data.
  • Hive tables are created as per requirement were Internal or External tables defined with appropriate static, dynamic partitions and bucketing, intended for efficiency.
  • Load and transform large sets of structured, semi structured data using hive.
  • Involved in setting up the pipeline for Bigdata genomics for different variant calling techniques using spark and scala.
  • Used spark and spark-sql to read the parquet data and create the tables in hive using the scala API.
  • Involved in making code changes for a module in Tumor simulator for processing across the cluster using spark-submit.
  • Involved in performing the analytics and visualization for the data from the tumor simulator to find the number of sub clones, population etc.
  • Used D3.js to create phylogenetic tree and radial tree from the json data that is generated from the hive queries.
  • Used WEBHDFS rest api to make the HTTP get, put, post and delete requests from the webserver to perform analytics on the data lake.
  • Worked on a POC to perform sentiment analysis of twitter data using spark-streaming.
  • Worked on high performance computing (HPC) to simulate tools required for the genomics pipeline.
  • Involved in setting up a test website on the webserver to execute various tools from the high performing computers and Hadoop cluster.

Environment: Hadoop, Hive, HDFS, HPC, WEBHDFS, WEBHCAT, spark, spark-sql, java, scala, webserver’s, Maven Build and SBT build.

Confidential, Cambridge, MA

Sr. Big Data Engineer

  • Collaborating with our CD team to design, deploy, manage and operate scalable, highly available, and fault tolerant systems on AWS.
  • Ensure data integrity and data security within our production system.
  • Develop and deliver within our Continuous Delivery Framework.
  • Shifting legacy applications to AWS.
  • Handle billions of log lines coming from several clients and analyze those using big data technologies like Hadoop (HDFS), Apache Kafka and Apache Storm.
  • Continuous improvement of code to handle more events coming into the cluster.
  • Scaling the cluster accordingly to handle sudden spike in the incoming logs.
  • Monitoring the entire cluster in Ambari and troubleshooting the storm supervisors, Kafka brokers and zoo keeper.
  • Query for huge sets of data for the event generation in No SQL database such as Vertica.
  • Query for feeds/regexes in MS SQL for URL module in the cluster.
  • Implement log metrics using log management tool such as Elastic Search, Log stash, Kibana (ELK stack) and visualize them in dashboards like Grafana and Kibana.
  • Use of YAML templating to send the metrics through Filebeat.
  • Migrate existing architecture to Amazon Web Services and utilize several technologies like Kinesis, RedShift, AWS Lambda, Cloud watch metrics.
  • Query in Amazon Athena with the alerts coming from S3 buckets and finding out the alerts generation difference from the Kafka cluster and Kinesis cluster.
  • Extensive use of Python for managing services in AWS using boto library.
  • Use of cloud orchestration technologies like Terraform to spin up the clusters.
  • Use terraform to setup security groups and CloudWatch metrics in AWS.
  • Aggressive unit testing of java code using Junit and Mockito.
  • Ensure data integrity and data security within the production environment.
  • Broad experience in using Linux Environment.
  • Understanding of tools for source management like Git, Perforce, continuous integration (CI) tools like Jenkins.
  • Extensive use of project management tools like JIRA.
  • Experience on working in an Agile Methodology.

Confidential, Charlotte, NC

Big data/Hadoop Developer

Responsibilities:

  • Written MapReduce code for processing and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration
  • Worked with HBase and Hive scripts to extract, transform and load data into HBase and Hive
  • Worked on moving all log files generated from various sources to HDFS for further processing
  • Developed workflows using custom MapReduce, Pig, Hive, and Sqoop
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with PIG
  • Tuned the cluster for optimal performance to process these large data sets
  • Worked hands on with ETL process. Handled importing data from various data sources, Performed transformations in HIVE
  • Built reusable Hive UDF libraries for business requirements which enabled users to use these UDFs in Hive Querying
  • Loaded the created HFiles into HBase for faster access of large customer base without taking performance hit
  • Written Hive UDF to sort Structure fields and return complex data type
  • Involved in writing HQL queries and SQL queries
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library
  • Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster
  • Modeled Hive partitions extensively for data separation and faster data processing and followed
  • Worked on the Ad hoc queries, Indexing, Replication, Load balancing, Aggregation in Mongo DB
  • Work with network and Linux system engineers to define optimum network configurations, server hardware and operating system
  • Evaluate and propose new tools and technologies to meet the needs of the organization
  • Production support responsibilities include cluster maintenance
  • Pig and Hive best practices for tuning
  • Gained good experience with NOSQL database

Environment: CDH, Hive, MySQL, HBase, HDFS, HIVE, Eclipse, Hadoop, Oracle, PL/SQL, SQL*PLUS, Toad 9.6, Flume, PIG, Sqoop, UNIX

Confidential

Software Engineer

Responsibilities:

  • Migrating the needed data from MySQL in to HDFS using Sqoop and importing various formats of flat files into HDFS .
  • Developed the Sqoop scripts in order to make the interaction between Hive and MySQL Database.
  • Mainly worked on Hive queries to categorize data of different wireless applications and security systems.
  • Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, analysis, design and development.
  • Extensively used Core Java, Servlets, JSP and XML.
  • Worked on Action classes, Request processor, Business Delegate, Business Objects, Service classes and JSP pages
  • Developed JSP pages using Struts custom tags.
  • Used SVN for source code versioning and code repository.
  • Prepared use-case diagrams, class diagrams and sequence diagrams as part of requirement specification documentation.
  • Used basic of dependency injection and commonly used dependency injection features of spring framework, using java configuration.
  • Implemented Business components using spring core and Navigation using Spring MVC.

Environment: Hadoop, Hive, MySQL, Sqoop, MVC, Struts, JSP, Servlets, JUnit, Apache Tomcat Server.

Confidential

Jr. Software Engineer

Responsibilities:

  • Document the entire project, which contains detail description of all the functionalities
  • Actively involved in gathering and analyzing the user requirements in coordination with Business
  • Worked as a developer in creating complex Stored Procedures, Triggers, Functions, Indexes, Tables, Views and other T-SQL code and SQL joins for applications
  • Extensively used T-SQL in constructing user defined functions, views, indexes, user profiles, relational database models and data integrity
  • Created Stored Procedures to validate the data coming with different data discrepancies using data conversions
  • Generated SSRS Enterprise reports from SQL Server Database (OLTP) and SQL Server and included various reporting features such as groups, sub-groups, adjacent-groups, group by total, group by sub-totals, grand-totals, drilldowns, drill through, sub-reports etc.
  • Created different Parameterized Reports like standard parameter reports, cascading parameter reports which consist of report Criteria in various reports to make minimize the report execution time and to limit the no of records required
  • Worked on all types of report types like tables, matrix, charts, sub reports etc.
  • Created Linked reports, Ad-hoc reports etc. based on the requirement. Linked reports are created in the Report Server to reduce the repetition the reports
  • Designed models using Framework Manager and deployed packages to the Report Net Servers
  • Implemented security to restrict the access of users and to allow them to use only certain reports

Environment: SQL, SQL Server 2008, SQL Server Management Studio, SSRS, SSIS, T-SQL, Microsoft Excel

We'd love your feedback!