We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Reston, VA

SUMMARY

  • 9+ years of diverse experience in the field of Information Technology, with approximately 4+ years of Hadoop experience development and designing, with few years in Java background. Also worked as Quality Analyst.
  • Hands on experience in designing and implementing a complete end - to-end Hadoop Infrastructure using MapReduce, YARN, HDFS, PIG, HIVE, Sqoop, HBase, Kafka Flume, Apache Solr, Titan Graph DB, Java Spring Boot, Hue and Apache Phoenix.
  • Experience in CI/CD Deployment tools like Maven, Jenkins, Bamboo, UCD.
  • Experience in dealing with Apache Hadoop principal components like HDFS, Sqoop, MapReduce, Hive, Hbase, Spark, and Scala.
  • Experience in dealing with Apache Hadoop additional components like Ambari, Hue, Oozie, IBM BIGSQL and IBM Maestro,
  • Expertise in Creating Hive Internal/External Tables/Views using shared Meta store, writing scripts in HiveQL.
  • Experience in designing and implementing data warehouse systems - gathering requirements, data modeling, developing ETL, and building test cases.
  • Good Experience with Guidewire PolicyCenter API.
  • Expertise in Data load management, importing & exporting data using SQOOP.
  • Good knowledge on ETL procedures to transform the data in the intermediate tables according to the business rules and functionality requirements.
  • Excellent hands-on experience in Unit/Integration and Functionality testing.
  • Worked on Job automation used shell scripting and Used bash shell scripting and Perl
  • Good experience in methodologies like Waterfall, Agile and Scrum processes.
  • Good experience on testing web services (SOAP, REST) using SOAP UI Tools
  • Testing Desktop and Web-Based applications with SQL/Oracle Databases. Expertise in using Tools HP Quality Center, Version One, ALM and JIRA

TECHNICAL SKILLS

Hadoop ecosystem: HDFS, MapReduce, Sqoop, Hive, Flume, Spark, Zookeeper, Oozie, Ambari, Hue, Yarn, Cloudera, Hbase, Solr, Janus Graph DB

IDE’s & Utilities: Eclipse, IntelliJ

Cloud Platform: Amazon web services (AWS), EMR, S3, EC2

Automation Test Tools: Selenium IDE/RC/ Web Driver, Scala test, JUnits, TESTNG, Fire path, Firebug, Docker, Soap UI, GIT, UCD

Defect Tracking Tools: HPSM, Team Foundation Server (2012, 2015), Quality Center 9.0/10.0, MTM, JIRA

Programming/Scripting Languages: C/C++, Python, Java Script, shell scripting, Scala, spark shell, Spring Boot

Operating Systems: Windows, Linux/ Unix

RDBMS: OracleDB, SQL Server, MySQL

Testing Methodologies: Agile, Waterfall, V-Model

PROFESSIONAL EXPERIENCE

Confidential, Reston, VA

Big Data Engineer

Responsibilities:

  • Involve and participate in day-to-day Project design and architecture meetings and finalize the Design.
  • Analyze the data and understand the data flow of current existing ETL system.
  • Analyze the Data in transactional tables in exiting database and create a Data Lineage sheet for all the source and target tables further are used in Development.
  • Extract Data from Oracle Database to EMR S3 storage using Spark.
  • Develop SQL views for different levels of transformations which will be further used for data processing.
  • Developing Spark Scala code to extract data from oracle db, process data by applying transformation on the data and apply SQL views to load the report required data to AWS RDS MySql.
  • Creating Hive external tables on top of the csv files to maintain temporary tables used in spark process.
  • Development is in progress for 15 reports data.
  • Involve in daily Agile Scrum meetings.
  • Developing shell scrip to perform the spark submit jobs based on different config files and parameters.

Environment: Amazon Web Services (AWS), EC2, EMR, RDS, Oozie, Spark, Scala, Spark SQL, HDFS, Hive, Linux, Intellij, Oracle, Shell Scripting.

Confidential, Cary, NC

Hadoop Developer

Responsibilities:

  • Experience in building data flow ingestion framework (MDFE) from external sources (DB2/IBM MQ/Informatica/SQL servers) into Big data Ecosystems (Hive/HBase/Solr/Titan/Janus) using Big Data tools Apache Spark, Spring Boot, Spark SQL.
  • Experience in working with Titan/Janus graph DB, create Vertices/Edges/Search commands to fetch the data for corresponding claim/member details for respective person/group to provide real time services.
  • Work with the connecting partners (GSSP/SPI or API) to gather the requirements to build and support the required framework.
  • Creating HBase/NoSQL designs including Slowly Changing Dimension (SCD) with in the design to maintain history of updates.
  • Creating Hive designs including Slowly Changing Dimension (Type-2 and Type-4) with in the design to maintain history of updates.
  • Developing Sqoop scripts to extract large historical and incremental data from legacy applications (traditional warehouses such as Oracle, DB2, and SQL Server) into Big Data for Data and Analytics business.
  • Creating the low-level design for the injection, transforming and processing of the Daily business datasets using Sqoop, HIVE, Spark, IBM BIGSQL, and IBM Maestro.
  • Automate and deploy the code into Production using CI/CD tools like BitBucket, Bamboo and urbancode.
  • Creation of various database objects and performance fine tuning of SQL tables and queries
  • Analyzing multiple SOR data which include disability/Absenceclaims, members, Treatments (ICD codes), Payments, etc., and involved in the design and architecture discussions of the project.
  • Implemented proof of concepts on Spark and Scala for various source data (XML/JSON) transformations and processed data using Spark SQL.
  • Creating Hive tables on top of the Hbase tables.
  • Creating Hive tables, loading data and performing analysis using hive queries.
  • Creating Scala script to read CSV files, process and load the corresponding JSON file to HDFS.
  • Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
  • Expert hands-on in data deduplication, data profiling for many production tables.
  • Worked with SPARK ecosystem using SCALA/HIVE Queries on different data formats like Text and CSV.
  • Strong knowledge and experience in Big Data, developing applications using Hadoop Ecosystem Analytics, Scala, and Apache Spark.
  • Developed Spark Scala program for transforming a semi and unstructured datainto the structured target using Data Bricks and SparkSQL.
  • Developed Scripts and Batch Job to schedule various Hadoop Program.
  • Write Hive queries for data analysis to meet the business requirements.
  • Loading data using batch processing of data sources using Apache Spark.

Environment: Scala, Spark, Janus, Titan, Hbase, Solr, Spark SQL, HDFS, Hive, Linux, Intellij, Hue, Sqoop, Oracle, Shell Scripting, Yarn, Ambari and Hortonworks.

Confidential, Dallas, TX

Hadoop Developer

Responsibilities:

  • Exploring with Spark improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, Pair RDD's and SparkYARN.
  • Implemented test scripts to support test driven development and continuous integration.
  • Involved in scheduling Oozie workflow engine to run multiple Spark jobs.
  • Developed the Spark data transformations programs based on Data Mapping.
  • Experience of GitHub and use of Git bash for code submission in GitHub repository.
  • Developed solutions to pre-process large sets of structured, with different file formats (Text file, Xml and JSON files).
  • Experienced with batch processing of data sources using Apache Spark.
  • Involve in Production Support and resolve any issues occurring asap to help the data flow to downstream on time.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
  • Deploy code using Urban code deploy and create a fully Automated Build and Deployment Platform and coordinating code builds, promotions and orchestrated deployments using Jenkins, Oozie and GIT.
  • Experienced on Docker to containerize the Application and all its dependencies by writing Docker file and Docker-Compose files. Designed, wrote, and maintained systems (Python) for administering GIT. By using Jenkins, Bamboo as a full cycle continuous delivery tool involving package creation, distribution and deployment onto Hadoop servers that has shell scripts embedded.
  • Developing scripts for build, deployment, maintenance, and related tasks to implement CI (Continuous Integration) system using Jenkins, Docker, Maven, Python and Bash.
  • Ability in development and execution of Bash, Shell Scripts, Groovy and Python Scripts. Efficient in working closely with teams to ensure high quality and timely delivery of builds and releases.
  • Automated various business processes and Created various monitoring scripts in Python to check for the Issues and close it automatically if manual interventions are not required.
  • Extensively worked on Bamboo, Docker for continuous integration and for End-to-End automation for all build and deployments.

Environment: Python, Scala, Spark, Spark SQL, HDFS, Hive, Linux, IntelliJ, Oozie, Hue, Oracle, Shell Scripting, Hortonworks, UCD (UrbanCodeDeploy), Docker, Jenkins, GitBash, Version One

Confidential, Cary, NC

Hadoop Developer

Responsibilities:

  • Developed Spark code using Scala for processing of data.
  • Migrated legacy application to Big Data application using Sqoop/Hive/Spark/HBase.
  • Implemented proof of concepts on Spark and Scala for various source data (XML/JSON) transformations and processed data using Spark SQL.
  • Creating Hive tables, loading data and performing analysis using hive queries. Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
  • Expert hands-on in data deduplication, data profiling for many production tables.
  • Worked with SPARK ecosystem using SCALA/HIVE Queries on different data formats like Text and CSV. Implemented Spark best practices like partitions, caching and check pointing for faster data access.
  • Involved in HBase schema key designs, and Solr designing for contextual search features for a POC.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Developed Scripts and Batch Job to schedule various Hadoop Program.
  • Write Hive queries for data analysis to meet the business requirements.
  • Exploring with Spark improving performance and optimization of the existing algorithms in Hadoop MapReduce using Spark Context, Spark-SQL, Data Frames, Pair RDD's and Spark YARN.
  • Developed solutions to pre-process large sets of structured, with different file formats (Text file, Avro data files, Sequence files, Xml and JSON files, ORC and Parquet).
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
  • Experienced with batch processing of data sources using Apache Spark.
  • Migrating tables from TEXT format to ORC and data induction and other customized file formats.
  • Designing and documenting the project use cases, writing test cases, leading offshore team, and interacting with client.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala. Responsible for managing data from multiple sources, Load and transform large sets of structured, semi-structured data.

Environment: Scala, Spark, Spark Streaming, Spark SQL, HDFS, Hive, Pig, Linux, Eclipse, Oozie, Hue, Apache Kafka, Sqoop, Oracle, Shell Scripting, Yarn, Ambari and Hortonworks.

Confidential, Austin, TX

Software Engineer

Responsibilities:

  • Manage and review Hadoop log files on clusters.
  • Performed NoSQL operations, Hadoop analytics, and event stream processing.
  • Extracted files from DB2 through Sqoop and placed in HDFS and processed.
  • Analyzed large data sets by running Hive queries and Pig scripts.
  • Creating Hive tables and loading and analyzing data using hive queries.
  • Developed Simple to complex Map Reduce Jobs using Hive and Pig
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Developed multiple Map Reduce jobs in java for data cleaning and pre-processing.
  • Involved in unit testing using MR unit for Map Reduce jobs.
  • Involved in loading data from LINUX file system to HDFS.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Involved with application teams in on-boarding Splunk and creating dashboards/alerts/reports.
  • Responsible in exporting analyzed data to relational databases using Sqoop.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries.
  • Experience of GitHub and use of Git bash for code submission in GitHub repository.
  • Worked on Hbase NoSQL based database that persists high-volume user profile data.
  • Create Hive scripts to extract, transform, load (ETL) and store the data using Talend.

Environment: Hadoop, HDFS, pig, Hive, Map Reduce, Sqoop, Python, Publisher/Subscriber, LINUX and Hbase.

Hire Now