We provide IT Staff Augmentation Services!

Big Data Engineer Resume

0/5 (Submit Your Rating)

San Francisco, CA

SUMMARY

  • Over 8 years of IT experience in various industries with hands - on experience in designing, developing, and implementing Big-data applications.
  • Hands-on experience in Spark, Data Integration, Data Analysis, Data Modeling, Data Design, Data Governance, Data Quality and Data Reporting.
  • Working with import and direct query, creating custom table in the Power BI.
  • Good working experience in designing Apache Airflow workflows for cleaning data and storing into Hive tables for quick analysis.
  • Good working knowledge of Databricks, Data Lakes and Data Factory.
  • Developed PYSpark ETL Pipelines from different Data sources like MongoDB, Redis cache Microsoft event hubs and Kafka topics to perform both Batch and real time Streaming using spark streaming.
  • Good experience working with different Hadoop file formats like Sequence File, ORC, AVRO, and Parquet.
  • Experience in importing and exporting data using Sqoop from HDFS to RDBMS and vise versa.
  • Experience in developing scalable solutions using NoSQL databases including MongoDB, HBASE and Cassandra.
  • Hands on experience with Real time streaming to Kafka, event hubs, SNS topics using Spark streaming.
  • Hands-on experience with cloud technologies AWS and Azure and their different services including AWS Lambda functions, AWS key management service, AWS SNS topics, AWS S3 box, AWS SQS queues, Azure Event hubs and Event grid, Azure functions, Azure service bus, Azure API management service, Azure key vaults, Azure Kubernetes service, Azure Devops, Azure storage accounts, and Azure data factory.
  • Install, configure, test, monitor, upgrade, and tune new and existing PostgreSQL databases.
  • Experience in using SparkSQL to convert schema-less data into more structured files for further analysis.
  • Hands on experience in converting Hive/SQL queries into Spark transformations using Spark RDDs and Spark SQL.
  • Experience in Spark Streaming to receive real time data and store the stream data into HDFS.

TECHNICAL SKILLS

Big Data: Spark, HDFS, Kafka, MapReduce, Hive, Impala, HBase, Sqoop, YARN

Programming languages: Python, SQL, PL/SQL.

Web Services & Technologies: RESTful.

RDBMS: Oracle 10g, PostgreSQL 9.Palantir

Cloud Technologies: AWS and Azure

ETL tools: Talend, Informatica (MDM, IDQ, TPT)

Databases: Oracle, MySQL, DB2, MongoDB, Redis, Teradata.

Microsoft Office: Word, Excel, PowerPoint, Visio, Teams.

Operating Systems: Windows, UNIX, Linux, Mac OS.

PROFESSIONAL EXPERIENCE

Confidential, San Francisco, CA

Big Data Engineer

Responsibilities:

  • Developing PYSpark, AWS, ETL Pipelines from different Data sources like Snowflake, MongoDB, Redis cache Microsoft event hubs and Kafka topics to perform both Batch and real time Streaming using spark in features store application for real time Machine learning Model serving and different machine learning downstream pipelines.
  • Experience in developing Power BI reports and dashboards from multiple data sources using data blending.
  • Developed various solution driven views and dashboards by developing different chart types including Pie Charts, Bar Charts, Tree Maps, Circle Views, Line Charts, Area Charts, Scatter Plots in Power BI.
  • Developing different spark pipelines in Databricks and scheduling the notebook jobs for daily sync jobs and monitoring the feature engineering pipelines.
  • Imported/exported data from RDBMS to HDFS using Data Ingestion tools like Sqoop.
  • Developing an enterprise level automated sync pipeline for migrating real time streaming features to offline Delta Lake.
  • Modified SQL queries to Spark using Spark RDDs and Scala, python.
  • Consumed Web Services through CLI to interact with other external interfaces to exchange the data in different forms by using RESTful service.
  • Implement several design patterns and made the code robust and highly scalable and used high performance data structures to make the application to perform at lightning speed.
  • Developing scripts to use modern Big-Data tools like PYSpark, SparkSQL to convert schema-less data into more structured files for further analysis.
  • Developed and successfully deployed many modules on Spark, Scala, and Python.
  • Extensively worked on creating combiners, Partitioning, distributed cache to improve the performance of spark jobs.
  • Successfully Handled Different File Formats like Csv, Json, multiline Json, nested Json, Text, Avro, Parquet file formats.

Confidential, DC

Big Data Engineer

Responsibilities:

  • Providing data integration, data ingestion for the use cases required for the Palantir analytics.
  • Writing code for data transformation and data integration required for Palantir analytics.
  • Data validation and quality check for data consumed by Palantir use cases.
  • Created Hive External tables in partitioned format to load the processed data obtained from MapReduce.
  • Specialized in transforming data into user-friendly visualization to give business users a complete view of their business using Power BI.
  • Used various sources to pull data into Power BI such as SQL Server, Oracle.
  • Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data. created AWS Lambda functions around external application API to monitor the health status of the API’s and logging the monitoring results data into Kibana.
  • Created ksql and mustache script queries to filter the alerts from Kibana and sending the stream of health events to AWS SQS topics for real time incident reporting through PagerDuty.
  • Developed analytical components using SparkSql and Spark Stream, Pyspark.
  • Created UNIX shell scripts (bash/Korn shell) for parameterizing the Sqoop and hive jobs.
  • Developing different spark pipelines in Databricks and scheduling the ETL notebook jobs for daily sync jobs to S3 box.
  • Developed custom scheduled jobs and daily sync activities on Palantir.
  • Implemented Kafka for broadcasting the logs generated using spark streaming.
  • Involved in loading data into HBase from Hive tables to see the performance.
  • Involved in creating Hive tables, loading data as text, parquet and orc to write into hive queries.
  • Created and maintained technical documentation for launching Hadoop Clusters and for executing Pig Scripts.
  • Commissioning and Decommissioning nodes to Hadoop Cluster.
  • Having good experience on trouble shooting the cluster issues.
  • Worked and developed customizations on a trained SSP analyzer model algorithm in PLSQL
  • Followed Agile & Scrum principles in developing the project.

Confidential, MD

Big Data Engineer

Responsibilities:

  • Design and developed services to persist and read data from HDFS, Hadoop, Hive and writing Java based MapReduce batch jobs.
  • Design and developed data grid generic framework by gathering the data from various, data source, AWS.
  • Server-side coding and development using Spring, Exception Handling, Java Collections including List, Map.
  • User profile and other unstructured data storage using Java and MongoDB.
  • Created and maintained technical documentation for launching Hadoop, Python.
  • Involved in managing deployments using xml scripts.
  • Developed Spark SQL scripts and involved in converting hive UDF's to Spark SQL UDF's.
  • Performed operations on data stored in HDFS and other NoSQL databases in both batch-oriented and ad-hoc contexts.
  • Developed multiple MapReduce jobs in java for data cleaning and preprocessing. Involved in loading data from LINUX file system to HDFS system, Pyspark.
  • Running batch processes using Pig Scripts and developed Pig UDFs for data manipulation per Business Requirements.
  • Workflow to export Cassandra column family data to CSV, loaded data to pig. Avro Data Serialization system to work with JSON data formats.
  • Accessing Hive tables to perform analytics from java applications using JDBC.
  • Used Partitioning pattern in Map Reduce to move records into categories.
  • Commissioning and Decommissioning nodes to Hadoop Cluster.
  • Imported/exported data from RDBMS to HDFS using Data Ingestion tools like Sqoop.
  • Created and maintained technical documentation for launching Hadoop Clusters and for executing Pig Scripts.
  • Testing - Unit testing through JUNIT & Integration testing in staging environment.
  • Followed Agile & Scrum principles in developing the project and AWS.

Confidential, GA

Data Engineer

Responsibilities:

  • Created complex program unit using Pyspark, PL/SQL Records, Python, Collection types.
  • Created database objects like Tables, Views, Materialized views, Procedures.
  • Extracted data from Teradata.
  • Created UNIX Shell Scripts for automated execution of batch process.
  • Handled complex Exceptions.
  • Packages using Oracle utilities like PL/SQL, SQL Plus, SQL Loader and
  • Designed the process flows to record the dependencies between the mapping runs and AWS.
  • Did technical design document for future development process.
  • Nested Tables and arrays in complex backend packages.
  • Designed Data Flow, Entity Relation and others Diagrams with Erwin, Visio and SDPRO.
  • Worked on production support team for EOM process issue.
  • Coordinated with DBA in creating and managing Tables, Indexes, db links and Privileges.
  • Testing and taking measures for maximum performance utilizing optimization hints,
  • Indexes, partitions, parallelism, storages, and query optimization.
  • Experience in Advance Queue, AWS.
  • Worked on performance tuning of SQL, PL/SQL and analyzed Tables.
  • Detailed user requirement Analysis and Design of systems.
  • Developed Application Interfaces using PL/SQL stored Packages.

We'd love your feedback!