We provide IT Staff Augmentation Services!

Big Data Engineer Resume

0/5 (Submit Your Rating)

NJ

SUMMARY

  • Over 7 years of IT experience in software development and support with experience in developing strategic methods for deploying Big Data technologies to efficiently solve Big Data processing requirement.
  • Expertise in Hadoop eco system components HDFS, Map Reduce, Yarn, HBase, Pig, Sqoop, Spark, Spark SQL, Springboot, Spark Streaming, and Hive for scalability, distributed computing, and high - performance computing.
  • Experience in using Hive Query Language for data Analytics.
  • Experienced in Installing, Maintaining and Configuring Hadoop Cluster.
  • Strong knowledge on creating and monitoring Hadoop clusters on Amazon EC2, VM, Hortonworks Data Platform 2.1 & 2.2, CDH3, CDH4Cloudera Manager on Linux, Ubuntu OS etc.
  • Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
  • Having Good knowledge on Single node and Multi node Cluster Configurations.
  • Strong knowledge in NOSQL column-oriented databases like HBase, Cassandra, MongoDB, and Mark Logic and its integration with Hadoop cluster.
  • Expertise on Scala Programming language and Spark Core.
  • Worked with AWS based data ingestion and transformations.
  • Worked with CloudBreak and BluePrint to configure AWS plotform.
  • Extensive experience building & operating highly available, distributed systems of data extraction, ingestion, and processing of large data sets using Hadoop and Spark (Scala, Java, SQL & Python).
  • Promote full cycle approach including request analysis, creating/pulling dataset, report creation and implementation and providing final analysis to the requestor
  • Good experience on Kafka and Storm.
  • Worked with Docker to establish connection between Spark and NEO4J database.
  • Knowledge of java virtual machines (JVM) and multithreaded processing.
  • Hands on experience working with ANSI SQL.
  • Experience in working with job scheduler like Autosys and Maestro.
  • Strong in databases like Sybase, DB2, Oracle, MS SQL,Clickstream.
  • Strong understanding of Agile Scrum and Waterfall SDLC methodologies.
  • Strong Working experience in snowflake.
  • Hands on experience with automation tools such as Puppet, Jenkins,chef,Ganglia,Nagios.
  • Strong communication, collaboration & team building skills with proficiency at grasping new technical concepts quickly and utilizing them in a productive manner.
  • Adept in analyzing information system needs, evaluating end-user requirements, custom designing solutions and troubleshooting information systems.
  • Strong analytical and Problem-solving skills

TECHNICAL SKILLS

Hadoop/Big Data Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, impala, Spark, Zookeeper and Cloudera Manager,Splunk.

NO SQL Database: HBase, Cassandra

Monitoring and Reporting: Tableau, Custom shell scripts

Hadoop Distribution: Horton Works, Cloudera, MapR

Build Tools: Maven, SQL Developer

Programming & Scripting: JAVA, C, SQL, Shell Scripting, Python, Scala

Java Technologies: Servlets, JavaBeans, JDBC, Spring, Hibernate, SOAP/Rest services

Databases: Oracle, MY SQL, MS SQL server, Teradata

Cloud Technologies: GCP (GCS, BQ, Dataproc), AWS (S3, EC2)

Version Control: SVN, CVS, GIT

Operating Systems: Linux, Unix, Mac OS-X, Cen OS, Windows10, Windows 8, Windows 7, Windows Server 2008/2003

PROFESSIONAL EXPERIENCE

Confidential, NJ

Big Data Engineer

Responsibilities:

  • Work closely with business in Requirement Analysis, Design, and Development.
  • Support code/design analysis, strategy development and project planning.
  • Understating the source and target systems to implement migration.
  • Using Sqoop & Teradata Fast Export to export data from Teradata and ingest into Hadoop Data Reservoir.
  • Automate the daily activities using Python, build Tableau dashboards for reporting.
  • Ingest the SOR data into Hadoop Staging - Conformed Zone after performing data cleansing and TDQ checks
  • Creating HIVE tables on top of Hadoop, tuning the HIVE table loads, long running queries by reviewing queries and data to save resources and meet SLA
  • Working closely with BI teams to build Cognos & Tableau dashboards
  • Perform Data Management activities for LOB/Tenant level to adhere Risk compliances and getting the clearances from Internal & external Audit Teams
  • Good hands-on migration of databases like oracle, Teradata onto Big data like Hadoop, hive, GCP and BigQuery.
  • Undergone through GCP certified Training, ability to work on GCP Projects.
  • Creating and maintaining the services and instances in GCP
  • Provided 24/7 production support for the production database and also to the code deployed into the production environment.
  • Strong Knowledge of all phases of Software Development Life Cycle (SDLC) such as Requirement Analysis, Design, Development, Testing, UAT, Implementation and Postproduction support.
  • Played a key role in adopting the Agile methodology and implementing Scrum
  • Rewarded with Tech-Ace award for decommissioning the legacy system and saving the Teradata space.
  • Responsible for defining the data flow within Hadoop eco system and direct the team in implement them.
  • Designs and develops test plans for ETL unit testing and integration testing
  • Involved in convertingHive/SQLqueries into Spark transformations using APIs likeSpark SQL, Data Framesand python.

Environment: Big Data Horton Work, Apache Hadoop, Hive, Python, Hue Tool, Zookeeper, Map Reduce, Sqoop, crunch API, Pig 0.10 and 0.11, HCatalog, Unix, Java, JSP, Eclipse, Maven, Oracle, SQL Server, Linux,MYSQL.

Confidential, NJ

Azure Data Engineer

Responsibilities:

  • Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data. Understand current Production state of application and determine the impact of new implementation on existing business processes.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
  • Migrating SQL database to Azure data Lake, Azure data lake Analytics,Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases toAzure Data lake storeusing Azure Data factory.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Experience in all phases of the System life Cycle including project definition, analysis, design, coding, testing, implementation and Production support.
  • Actively involved in deployment and post release support.
  • Worked on the Analytics Infrastructure team to develop a stream filtering system on top of Apache Kafka and Storm.
  • Implemented large Lamda architectures using Azure Data platform capabilities like Azure Data Lake, Azure Data Factory, HDInsight, Azure SQL Server, Azure ML and Power BI.
  • Demonstrated expert level technical capabilities in areas of Azure Batch and Interactive solutions, Azure Machine learning solutions and operationalizing end to end Azure Cloud Analytics solutions.
  • Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • To meet specific business requirements wrote UDF’s in Scala and Pyspark.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.
  • Hands-on experience on developing SQL Scripts for automation purpose.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Spark, Kafka, IntelliJ, ADF,Cosmos, Sbt, Zeppelin, YARN, Scala, SQL, Git.

Confidential, Richmond, VA

BigData Developer

Responsibilities:

  • Processed BigData using a Hadoop cluster consisting of 40 nodes.
  • Designed and configured Flume servers to collect data from the network proxy servers and store to HDFS.
  • Loaded the customer profiles data, customer spending data, credit from legacy warehouses onto HDFS using Sqoop.
  • Applied transformations and filtered both traffic using Pig.
  • Used Pattern matching algorithms to recognize the customer across different sources and built risk profiles for each customer using Hive and stored the results in HBase.
  • Performed unit testing using MRUnit.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and persists into Cassandra.
  • Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
  • Experience in design and develop the POC in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Hands on experience in AWS Cloud in various AWS services such as Redshift cluster, Route 53 domain configuration.
  • Used Spark API over Horton work Hadoop YARN to perform analytics on data in Hive and involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
  • Responsible for building scalable distributed data solutions using Hadoop
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
  • Setup and benchmarked Hadoop/HBase clusters for internal use
  • Developed Simple to complex Map/reduce Jobs using Hive and Pig.
  • Involved in converting Map Reduce programs into Spark transformations using Spark RDD on python.
  • Developed Spark scripts by using python Shell commands as per the requirement.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Developed Merge jobs inPythonto extract and load data into MySQL database.
  • Analyzed the data by performing Hive queries and running Pig scripts to study employee behavior.

Environment: Hadoop, Hive, Zookeeper, Python, Map Reduce, Sqoop, Pig 0.10 and 0.11, JDK1.6,HDFS, Flume, Oozie, DB2, HBase, Mahout, Unix, Linux

Confidential

BigData Developer/Admin

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Developed Simple to complex Map reduce Jobs using Hive and Pig.
  • Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team. Extensively used Pig for data cleansing.
  • Created partitioned tables in Hive. Managed and reviewed Hadoop log files.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.Installed and configured Pig and written Pig Latin scripts.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Load and transform large sets of structured, semi structured and unstructured data Responsible to manage data coming from different sources
  • Worked with application teams to install operating system, Hadoopupdates, patches, version upgrades as required.

Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Java, SQL, Sqoop, Java (jdk 1.6), Eclipse, Git, Unix,Linux,Subversion.

We'd love your feedback!