We provide IT Staff Augmentation Services!

Senior Bigdata Engineer Resume

0/5 (Submit Your Rating)

Addison, TX

SUMMARY:

  • Around 8 years of IT experience involving project development, implementation, deployment and maintenance using Big data Hadoop Ecosystem and Cloud related technologies in various sectors wif multiprogramming language expertise like Scala, Java, Python.
  • Hadoop Developer experience in designing and implementing complete end - to-end Hadoop Infrastructure using HDFS, MapReduce, HBase, Spark, Yarn, Kafka, Zookeeper, PIG, HIVE, Sqoop, Oozie, Kudu, and Flume, NIFI, Kafka Connect.
  • In depth understanding of Hadoop Architecture and its various components such as Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Map Reduce concepts.
  • Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
  • Experience ofPartitions, Bucketingconcepts in Hive and designed both Managed and Externaltables in Hive to optimize performance
  • Experience wif different file formats like Avro, parquet, ORC, Json and XML.
  • Hands on experience wif NoSQL databases like HBase and Cassandra.
  • Experience wif Oozie workflows and scheduler for scheduling jobs and also wif UNIX shell scripting.
  • Experience in developing Spark applications using Scala and Python.
  • Hands on experience wif different Spark API’s like Core, SQL, Streaming and Structured Streaming using Scala and Python
  • Experience working wif different file formats like parquet, Avro, ORC, Json and XML using Spark api.
  • Experience in developing generic frameworks for data ingestion, data processing, data cleansing and analytic frameworks using Spark.
  • Experience in performance tuning of spark applications from various aspects.
  • Experience wif Serializing and Deserializing different data file formats using Spark.
  • Experience in using accumulator variables, broadcast variables, and RDD catching for spark streaming.
  • Experience in consuming data from various data sources like Kafka, S3,SFTP servers etc., and stored in various data stores like Hbase, Kudu, Hive Atana, DynamoDB etc.,
  • Experience in developing data applications using AWS services like S3, EC2, EMR, Atana, Redshift Spectrum, Redshift and DynamoDB.
  • Experience working serverless services in AWS like Lambda, Glue, Data Pipeline and Step functions.
  • Experience wif other AWS services like Cloud watch, Cloud Trail, Cloud formation and SNS.
  • Experience wif the CICD process using Git, Jenkins and other repository managers.
  • Experience in writing unit test cases using ScalaTest for the code developed.
  • I have a good understanding of the principles of Data warehousing concepts and also in using Dimensional Modeling, Fact tables, Dimension tables, Data Marts, star/snowflake schema.
  • Experience in applications of scrum, waterfall, and agile methodologies, skilled in developing processes that facilitate continual progress and team .
  • Worked extensively wif Data migration, Data cleansing, Data profiling and ETL Processes features for data warehouses.
  • Experience in writing queries using SQL, experience in data integration, and performance tuning.
  • Experience in software development life cycle and production support life cycle, experience in EDW and Data Mart projects, and knowledge in OLAP and OLTP Systems.

TECHNICAL SKILLS:

Big Data Ecosystems: Spark/PySpark,Kafka,Hadoop, Map Reduce, HDFS, Zookeeper, Hive, Sqoop, Oozie, Flume, Yarn.

Programming Languages: Python, Java, Scala, Shell, SQL, HTML, CSS

ETL: IBM Infosphere DataStage

Reporting: Tableau, MS Excel

Databases: RDBMS, HBase, Cassandra, RedShift, NoSQL, S3, Couch DB and Oracle DB

Tools: Control-M,Talend, Snowflake web UI, AWS S3, Cloudwatch and EMR

Frameworks and Libraries: Django, Flask, Pandas, NumPy, SciPy, PyTorch, SciKit and TensorFlow.

Data Integration: Sqoop, Flume

Distributed File Systems: HDFS, S3

Batch Processing Hive, Map: Reduce, Pig, Spark

Cloud Platform: AWS

PROFESSIONAL EXPERIENCE:

Confidential, Addison TX

Senior Bigdata Engineer

Responsibilities:

  • Familiarity wif ETL and data transfer technology Nifi.
  • Building and managing hosted big data architecture Hadoop wif Oozie, Sqoop, Hive, Spark, NiFi.
  • Experience in creating NIFI flows for data ingestion.
  • Experience in data ingestion and data migration.
  • Experience in building data pipelines for large volumes of data using Apache Nifi.
  • Experience wif Job scheduling and monitoring wif Oozie.
  • Expert in implementing and troubleshooting in Hive and Nifi applications.
  • Developed an exporter framework in Python to get scheduler, job history data from YARN REST APIs and store them into Hbase tables, generate custom alerts to monitor SLA bound Hadoop applications.
  • Manage and schedule jobs by defining Hive, Spark, Sqoop and Python actions on a Hadoop cluster using Oozie work flows and Oozie Coordinator engine
  • Responsible for data extraction and data ingestion from different data sources into HDFS by creating ETL pipelines.
  • Perform importing and exporting data into HDFS and Hive using Sqoop.
  • Utilize SparkSQL to extract and process data by parsing using Datasets or RDDs wif transformations and actions (map, flatMap, filter, reduce, reduceByKey)
  • Develop the Oozie actions like hive, spark and java to submit and schedule applications to run in Hadoop cluster
  • Resolve Spark and Yarn resource management issues in Spark including Shuffle issues, Out of Memory issues, heap space errors and schema compatibility.
  • Monitor and troubleshoot performance of applications and take corrective actions in-case of failures and evaluate possible enhancements to meet SLAs

Environment: Hadoop, Spark (Scala, Python), Apache Nifi, Spark (Core, SQL, Streaming), AWS, Cassandra, Oracle, Mongo DB, SQL Server, DB2, Oozie, and Linux

Confidential, Charlotte NC

Big Data Engineer

Responsibilities:

  • Evaluated business requirements and prepared Detailed Design documents that followed Project guidelines and SLAs required procuring data from all the upstream data sources and developing written programs.
  • Worked on analyzing Hadoopcluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
  • Installed Hadoop, Map Reduce, HDFS, and developed multiple map reduce jobs in PIG and Hive for data cleaning and preprocessing.
  • Coordinated wif business customers to gather business requirements. And interact wif other technical peers to derive Technical requirements and deliver the BRD and TDD documents.
  • Extensively involved in the Design phase and delivered Design documents.
  • Experienced in writing Hadoop Jobs for analyzing data using HiveQL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
  • Involved in Testing and coordination wif business in User testing.
  • Importing and exporting data into HDFS and Hive using SQOOP.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying of the log data.
  • Developed and designed a system to collect data from multiple portals using kafka and tan process it using spark.
  • Involved in creating Hive tables, tan applying HiveQL on those tables, dis will invoke and run MapReduce jobs automatically.
  • Used Kafka for building real-time data pipelines between clusters.
  • Ran Log aggregations, website activity tracking and commit log for distributed systems using Apache Kafka.
  • Involved in creating Hive tables, loading data and writing Hive queries that will run internally in a map-reduced way.
  • Experienced in defining job flows.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Experienced in managing and reviewing the Hadooplog files.
  • Used Pig as an ETL tool to do Transformations, even joins and some pre-aggregations before storing the data into HDFS.
  • Extending HIVE and PIG core functionality by using custom User Defined Functions (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregate Functions (UDAF) for Hive and Pig.
  • Worked in designing and deployment of Hadoop clusters and different big data analytic tools including Pig, Hive, Oozie, Zookeeper, Sqoop, flume, Impala, Cassandra wif Horton work distribution.
  • Experienced in running Hadoopstreaming jobs to process terabytes of xml format data.
  • Involved in scheduling the Oozie workflow engine to run multiple Hive and pig jobs. Wrote shell scripts to automate the jobs in UNIX.
  • Created HBase tables to store various data formats of incoming data from different portfolios.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise-wide data.
  • Developed the verification and control process for daily load.
  • Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs

Environment: Spark, Hadoop, HDFS, Hive, Kafka, Sqoop,Scala, Cassandra, Oozie, Cloudera, IBM Infosphere DataStage, Flume, Netezza, Linux,Control-M, Oracle, DB2,AWS.

Confidential, Johnston, RI

Data Engineer

Responsibilities:

  • Developed and designed jobs using DataStage Designer as per the mapping specifications by using appropriate stages.
  • Designed ETL processes to extract the source data and load it into the data warehouse after cleansing, transforming and integrating. Importing metadata from the repository, created routines, new job categories, and data elements.
  • Worked on designing, developing, documenting, testing of ETL jobs, and mappings in both sequence jobs and parallel jobs using DataStage to populate tables in data warehouses and data marts.
  • Designing jobs using different parallel job stages such as Join, Merge, Lookup, Remove Duplicates, Copy, Filter, Funnel, Dataset, Lookup File Set, Change Data Capture, Modify, and Aggregator.
  • Working on the integration of various data sources (Oracle, DB2, Teradata and SQL Server) into a data staging area.
  • Collaborated wif the EDW team in high-level design documents for extract, transform, validate and load ETL process data dictionaries, Metadata descriptions, file layouts and flow diagrams.
  • Generated Surrogate Keys for the fact and dimension tables for faster access and indexing of data in the Data Warehouse. Tuned Transformations and jobs for performance enhancement.
  • Created Batches and Sequences to control a set of jobs.
  • Developed UNIX scripts to automate file manipulation and data loading procedures.
  • Optimized performance in a large database environment utilizing Parallelism through different partition methods.
  • Performed Unit testing to ensure DataStage jobs met the requirements.

Environment: IBM InfoSphere DataStage 9.1 and above (Administrator, Designer, Director), SQL 2008, Oracle 11g, Shell Scripts.

Confidential

Big Data Engineer / Hadoop Developer

Responsibilities:

  • Interacted wif business partners, Business Analysts and product owner to understand requirements and build
  • Scalable distributed data solutions using Hadoop ecosystem.
  • Developed Spark Streaming programs to process near real time data from Kafka, and process data wif both stateless and stateful transformations.
  • Worked wif HIVE data warehouse infrastructure-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HQL queries.
  • Built and implemented automated procedures to split large files into smaller batches of data to facilitate FTP transfer which reduced 60% of execution time.
  • Worked on developing ETL processes to load data from multiple data sources to HDFS using FLUME and SQOOP and performed structural modifications using Map Reduce, HIVE.
  • Developing Spark scripts, UDFS uses both Spark DSL and Spark SQL query for data aggregation, querying, and writing data back into RDBMS through Sqoop.
  • Strong understanding of Partitioning, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Developing ETL pipelines in and out of a data warehouse using a combination of Python and Snowflake's SnowSQL Writing SQL queries against Snowflake.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and to compute various metrics for reporting on the dashboard.
Environment: AWS, PySpark, Apache Spark, HBase, Apache Kafka, HIVE, SQOOP, FLUME, Apache oozie, ETL, UDF, Map Reduce, Python

We'd love your feedback!