We provide IT Staff Augmentation Services!

Big Data Engineer Resume

4.00/5 (Submit Your Rating)

Chicago, IL

SUMMARY

  • 8+ years of overall software development experience on Big Data Technologies, Hadoop Eco system and SQL Knowledge with programming experience in Python, Scala, and Java.
  • 3+ years of strong hands - on experience on Hadoop Ecosystem including Spark, Map-Reduce, Hive, Pig, HDFS, YARN, HBase, Oozie, Kafka and Sqoop.
  • Experience in architecting, designing, and building distributed data pipelines.
  • Deep knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance.
  • Worked with real-time data processing and streaming techniques using Spark streaming and Kafka.
  • Experience in moving data into and out of the HDFS and Relational Database Systems (RDBMS) using Apache Sqoop.
  • Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data.
  • Significant experience writing custom UDF’s in Hive and custom Input Formats in MapReduce.
  • Knowledge of job workflow scheduling and monitoring tools like Oozie, Airflow.
  • Experience using various Hadoop Distributions (Cloudera, Hortonworks, Amazon AWS EMR) to fully implement and utilize various Hadoop services.
  • Experience working with NoSQL databases like MongoDB, Cassandra and HBase.
  • Used Hive extensively for performing various data analytics required by business teams.
  • Solid experience in working various data formats like Parquet, Orc, Avro, Json etc.,
  • Good experience is designing and implementing end to end data security and governance within Hadoop Platform using Kerberos.
  • Hands on experience in developing end to end Spark applications using Spark apis like RDD, Spark Data frame API, Spark MLLib, Spark Streaming and Spark SQL.
  • Good experience working with various data analytics and big data services in AWS Cloud like EMR, Redshift, S3, Athena, Glue etc.,
  • Good understanding of Spark ML algorithms such as Classification, Clustering, and Regression.
  • Experienced in migrating data warehousing workloads into Hadoop based data lakes using MR, Hive, Pig and Sqoop.
  • Setting up the build and deployment automation for Java base project by using Jenkins.
  • Developed spark applications in python (Pyspark) on distributed environment to load huge number of CSV files with different schema in to Hive ORC tables.
  • Experience in maintaining an Apache Tomcat MYSQL, LDAP, Web service environment.
  • Designed ETL workflows on Tableau, Deployed data from various sources to HDFS.
  • Good experience with use-case development, with Software methodologies like Agile and Waterfall.

TECHNICAL SKILLS

BigData Technologies/ Hadoop Components: HDFS, Hue, MapReduce, Yarn, Sqoop, Pig, Hive, HBase, Oozie, Kafka, Impala, Zookeeper, Flume, Cloudera Manager, Airflow

Spark: Spark SQL, Spark Streaming, Data Frames, YARN, Pair RDD’s

Cloud Services: AWS (S3, EC2, EMR, Lambda, RedShift, Glue), Azure (Azure Data Factory / ETL / ELT / SSIS, Azure Data Lake Storage, Azure Databricks)

Programming Languages: SQL, PySpark, Python, Scala, Java

Databases: Oracle, MySQL, DB2, SQL Server, Teradata

NoSQL Databases: HBase, Cassandra, MongoDB

Web Technologies: HTML, JDBC, Java Script, CSS

Version Control Tools: GitHub, Bitbucket

Server-Side Scripting: UNIX Shell, Power Shell Scripting

IDE: Eclipse, PyCharm, Notepad++, IntelliJ, Visual Studio

Operating Systems: Linux, Unix, Ubuntu, Windows, Cent OS

PROFESSIONAL EXPERIENCE

Big Data Engineer

Confidential - Chicago, IL

Responsibilities:

  • Working as a Data Engineer utilizing Big data & Hadoop Ecosystems components for building highly scalable data pipelines.
  • Worked in Agile development environment and participated in daily scrum and other design related meetings.
  • Involved in converting Hive/SQL queries into Spark transformations using PySpark.
  • Worked on Spark SQL, created data frames by loading data from Hive tables and created prep data and stored in AWS S3.
  • Responsible for loading the customer's data and event logs from Kafka into Redshift through spark streaming.
  • Developed batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework.
  • Optimization of Hive queries using best practices and right parameters and using technologies like Hadoop, YARN, Python, Pyspark.
  • Worked on reading and writing multiple data formats like JSON, ORC, Parquet on HDFS using Pyspark.
  • Analyzed the SQL scripts and designed it by using Pyspark SQL for faster performance.
  • Created Sqoop Scripts to import and export customer profile data from RDBMS to S3 buckets.
  • Built custom Input adapters to migrate click stream data from FTP servers to S3.
  • Developed various enrichment applications in spark using scala for performing cleansing and enrichment of click stream data with customer profile lookups.
  • Troubleshooting Spark applications for improved error tolerance and reliability.
  • Used Spark Data frame and Spark API to implement batch processing of Jobs.
  • Worked on fine tuning and performance enhancements of various spark applications and hive scripts.
  • Used various concepts in spark like broadcast variables, caching, dynamic allocation etc to design more scalable spark applications.
  • Implemented continuous integration and deployment using CI/CD tools like Jenkins, GIT, Maven.

Environment: AWS EMR, S3, Spark, Hive, Sqoop, Scala, Java, MySQL, Oracle DB, Athena, Redshift.

Spark Developer

Confidential - Minneapolis, MN

Responsibilities:

  • DataIngestioninto the DataLake using Open-source Hadoop distribution to process Structured, Semi-Structured and Unstructured datasets.
  • Expertise in Hive queries created user defined aggregated function worked on advanced optimization techniques and have extensive knowledge on joins.
  • Create hive scripts to extract, transform, load (ETL) and store the data using Talend.
  • Developed Sqoop Scripts to extract data from DB2 EDW source databases onto HDFS.
  • Worked with Oracle and Teradata for data import/export operations from different data marts.
  • Worked extensively with Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
  • Hands on experience inAzureCloud, stored the data in ADLS - Azure Data Lake Storage,App services,Databricks cluster for running the jobs,Azure SQL Database,Virtual machines,Fabric controller,Azure AD, Azure search, and notification hub.
  • Designed, configured, and deployed MicrosoftAzurefor a multitude of applications utilizing theAzurestack (Including Compute, Web & Mobile, Blobs, Resource Groups, Azure SQL, Cloud Services, and ARM), focusing on high - availability, fault tolerance, and auto-scaling.
  • Developed Spark Application by using Python (Pyspark).
  • Developed and implemented API services using Python in spark.
  • CreatedPartitions,Bucketsbased on State to further process usingBucketbased Hive joins.
  • Responsible for continuous monitoring and managing Elastic MapReduce cluster throughAWS console.
  • ImplementedSparkusing Scala andSparkSQL for faster testing and processing of data.
  • Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment.
  • Worked on migrating MapReduce programs intoSparktransformations usingSparkand Scala.

Environment: Hadoop, Hive, Talend, Map Reduce, Pig, Salesforce, SQOOP, Splunk, CDH5, Python, HDFS, Pig, DB2, Sqoop, Oozie, Putty, Java.

Data Engineer

Confidential - Scottsdale, AZ

Responsibilities:

  • Build the new universes in Business Objects as per the user requirements by identifying the required tables from Data mart and by defining the universe connections.
  • Used Business Objects to create reports based on SQL-queries. Generated executive dashboard reports with latest company financial data by business unit and by product.
  • Performed the data analysis and mapping database normalization, performance tuning, query optimization data extraction, transfer, and loading (ETL) and clean up.
  • Implemented Teradata RDBMS analysis with Business Objects to develop reports, interactive drill charts, balanced scorecards, and dynamic Dashboards.
  • Responsible for requirements gathering, status reporting, creating various metrics, projects deliverables.
  • Developed PL/SQL Procedures, Functions and Packages and used SQL loader to load data into the database.
  • Designed and developed Informatica Mappings to load data from Source systems. Worked on Informatics Power Center tool - Source Analyzer, Data warehousing designer, Mapping Mapplet Designer and Transformation Designer.
  • Involved in migrating warehouse database from Oracle 9i to 10g database.
  • Involved in analyzing and adding new features of Oracle 10g like DBMS SHEDULER create directory, data pump, CONNECT BY ROOT in existing Oracle 9i application.
  • Tuned Report performance by exploiting the Oracle's new built-in functions and rewriting SQL statements.

Environment: SQL Server, J2EE, UNIX, .Net, MS Project, Oracle, Web Logic, Shell script, JavaScript, HTML, Microsoft Office Suite 2010, Excel

Hadoop Developer

Confidential

Responsibilities:

  • Involved in Requirements Analysis and design an Object-oriented domain model.
  • Designed Use Case Diagrams, Class Diagrams and Sequence Diagrams and Object.
  • Involved in designing user screens using HTML as per user requirements.
  • Used Spring-Hibernate integration in the back end to fetch data from Oracle and MYSQL databases.
  • Used Spring Dependency Injection properties to provide loose-coupling between layers.
  • Implemented the Web Service client for the login authentication, credit reports and applicant information.
  • Used Web services (SOAP) for transmission of large blocks of XML data over HTTP.
  • Implemented the logging mechanism using Log4j framework.
  • Wrote test cases in JUnit for unit testing of classes.
  • Developed application to be implemented on Windows XP.
  • Created application using Eclipse IDE.
  • Installed Web Logic Server for handling HTTP Request/Response.
  • Used Subversion for version control and created automated build scripts.

We'd love your feedback!