We provide IT Staff Augmentation Services!

Senior Big Data Engineer Resume

3.00/5 (Submit Your Rating)

Phoenix, AZ

SUMMARY

  • Overall, 7+ years of professional experience in Information Technology and Expertise in BIGDATA using HADOOP framework and Analysis, Design, Development, Testing, Documentation, Deployment and Integration using SQL and Big Data technologies.
  • Experience in using different Hadoop eco system components such as HDFS, YARN, MapReduce, Spark, Sqoop, Hive and Kafka.
  • Experienced in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, and EMR, Elasticsearch), Hadoop, Python, Spark and effective use of map - reduce, SQL and Cassandra to solve big data type problems.
  • Experienced in using various Python libraries like NumPy, SciPy, python-twitter, Pandas.
  • Experience in developing custom UDFsfor Pig and Hive to in corporate methods and functionality of Python/Java intoPig LatinandHQL(HiveQL) and Used UDFs from Piggybank UDF Repository.
  • Experience with spark streaming and to write spark jobs.
  • Experience with Data warehousing and Data mining, using one or more NoSQL Databases like HBase, Cassandra, and Mongo DB.
  • Used RStudio for data pre-processing and building machine learning algorithms on datasets.
  • Good Knowledge on NLP, Statistical Models, Machine Learning, Data Mining solutions to various business problems and generating using R, Python.
  • Strong experience in core Java,Scala, SQL, PL/SQL and Restful web services.
  • Acquires good understanding of JIRA and maintaining JIRA dashboards.
  • Knowledge in using Java IDE’s like Eclipse and IntelliJ
  • Used Maven for building projects.
  • Developed DAGs and automated the process for the data science teams.
  • Developed data pipelines using ETL tools SQL Server Integration Services (SSIS), Microsoft Visual Studio (SSDT)
  • Experience in designing visualizations using Tableau software and storyline, publishing and presenting dashboards.
  • Expert in building Enterprise Data Warehouse or Data warehouse appliances from Scratch using both Kimball and Inmon’s Approach.
  • Good experience with use-case development, with Software methodologies like Agile and Waterfall.
  • Active team player with excellent interpersonal skills, keen learner with self-commitment& innovation.
  • Proven ability to manage all stages of project development Strong Problem Solving and Analytical skills and abilities to make Balanced and Independent Decisions

TECHNICAL SKILLS

Big Data/Hadoop Technologies: MapReduce, Spark, SparkSQL, Azure, Spark Streaming, Kafka, PySpark,, Pig, Hive, HBase, Flume, Yarn, Oozie, Zookeeper, Hue, Ambari Server

Languages: HTML5,DHTML, WSDL, CSS3, C, C++, XML,R/R Studio, SAS Enterprise Guide, SAS, R (Caret, Weka, ggplot), Perl, MATLAB, Mathematica, FORTRAN, DTD, Schemas, Json, Ajax, Java, Scala, Python (NumPy, SciPy, Pandas, Gensim, Keras), Java Script, Shell Scripting

NO SQL Databases: Cassandra, HBase, MongoDB, MariaDB

Web Design Tools: HTML, CSS, JavaScript, JSP, jQuery, XML

Development Tools: Microsoft SQL Studio, IntelliJ, Azure Databricks, Eclipse, NetBeans.

Public Cloud: EC2, IAM, S3, Autoscaling, CloudWatch, Route53, EMR, RedShift

Development Methodologies: Agile/Scrum, UML, Design Patterns, Waterfall

Build Tools: Jenkins, Toad, SQL Loader, PostgreSql, Talend, Maven, ANT, RTC, RSA, Control-M, Oozie, Hue, SOAP UI

Reporting Tools: MS Office (Word/Excel/Power Point/ Visio/Outlook), Crystal reports XI, SSRS, cognos.

Databases: Microsoft SQL Server, MySQL, Oracle, DB2, Teradata, Netezza

Operating Systems: All versions of Windows, UNIX, LINUX, Macintosh HD, Sun Solaris

PROFESSIONAL EXPERIENCE

Confidential, Phoenix, AZ

Senior Big Data Engineer

Responsibilities:

  • Familiarity with Hive joins & used HQL for querying the databases eventually leading to complex Hive UDFs.
  • Installed OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera distribution including configuration management, monitoring, debugging, and performance tuning.
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration.
  • Worked on Installing Cloudera Manager, CDH and install the JCE Policy File to Create a Kerberos Principal for the Cloudera Manager Server, enabling Kerberos Using the Wizard.
  • Strong experience in writing scripts usingPythonAPI, PySpark API and Spark API for analyzing the data.
  • Utilized AWS services with focus on big data analytics, enterprise data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility.
  • Developed Scala based Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning and reporting teams to consume.
  • Designed AWS architecture, Cloud migration, AWS EMR, DynamoDB, Redshift and event processing using lambda function.
  • Conducted ETL Data Integration, Cleansing, and Transformations using AWS glue Spark script.
  • Responsible for loading unstructured and semi-structured data into Hadoop.
  • Connects to various data sources to extract, transform, and load data before modeling and visualizations using Python and SQL.
  • Design, develop and maintain data models, pipelines, and Production workflows to support financial data among back-end databases.
  • Implements Machine learning solutions using Python to provide reliable and accurate sales predictions and diagnostic self-serve information to clients and business units.
  • Developed algorithms using data science technologies to build analytical models using python. Worked on modeling techniques like Time series, exponential smoothing and regression
  • Worked on modeling on variety of regression and supervised and unsupervised learning techniques.
  • Worked with data scientist on different libraries to show data’s applicability to business problems.
  • Involved in the data support team as role of bug fixes, schedule change, memory tuning, schema changes loading the historic data.
  • Worked with AWS IAM to generate new accounts, assign roles and groups.
  • Good knowledge of Data Marts, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modelling, Snowflake Modelling for FACT and Dimensions Tables) using Analysis Services.

Environment: Python 3.6, 3.9, Redshift, PySpark, EC2, EMR, Glue, S3, Kafka, IAM, PostgreSQL, Teradata, MS SQL Server 2012/2014/2016 , MS SQL Server Integration Services 2012/2016, MS Visual Studio, Jenkins, Maven, AWS CLI, Git.

Confidential, Irving TX

Senior Big Data Engineer

Responsibilities:

  • Familiarity with Hive joins & used HQL for querying the databases eventually leading to complex Hive UDFs.
  • Installed OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera distribution including configuration management, monitoring, debugging, and performance tuning.
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration.
  • Worked on Installing Cloudera Manager, CDH and install the JCE Policy File to Create a Kerberos Principal for the Cloudera Manager Server, enabling Kerberos Using the Wizard.
  • Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB)
  • Worked on Configuring Kerberos Authentication in the cluster.
  • Installed and configured pig, written Pig Latin scripts to convert the data from Text file to Avro format.
  • Develop Nifi workflow to pick up the data from rest API server, from data lake as well as from SFTP server and send that to Kafka broker.
  • Worked on continuous Integration tools Jenkins and automated jar files at end of day.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.
  • Implemented Real time analytics on Cassandra data using thrift API.
  • Designed Columnar families in Cassandra and Ingested data from RDBMS, performed transformations and exported the data to Cassandra.
  • Worked with data modelers to understand financial data model and provided suggestions to the logical and physical data model.
  • Perform Table partitioning, monthly & yearly data Archival activities.
  • Developing python scripts for Redshift CloudWatch metrics data collection and automating the data points to redshift database.
  • Developed JavaMap Reduce programsfor the analysis of sample log file stored in cluster.
  • Testing the processed data through various test cases to meet the business requirements.

Environment: Cloudera CDH5.13, Ambari, IBM Web Sphere, Hive, Kafka, Python, HBase, Spark, Scala, Map Reduce, HDFS, Sqoop, Java, Azure, Data bricks, Data Lake, Data Factory, Azure SQL, Flume, Linux, Shell Scripting, Tableau, UNIX, SQL, No-SQL.

Confidential, Rockville MD

Big Data Engineer

Responsibilities:

  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
  • Worked on data cleaning and reshaping, generated segmented subsets using NumPy and Pandas in Python.
  • Wrote and optimized complex SQL queries involving multiple joins and advanced analytical functions to perform data extraction and merging from large volumes of historical data stored in Oracle 11g, validating the ETL processed data in target database.
  • Designed, developed, and implemented the ETL pipelines using python API (PySpark) of Apache Spark.
  • Developing Microservices and creating APIs with Python Django framework using Jenkins as a build tool and enterprise level database.
  • Perform DB activities such as indexing, performance tuning, and backup and restore.
  • Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
  • Created a Serverless data ingestion pipeline on AWS usingMSK(Kafka)and lambda functions.
  • Developed Autosys scripts to schedule the Kafka streaming and batch job.
  • Build, Enhance, optimized Data Pipelines using Reusable frameworks to support data need for the analytics and Business team using Spark and Kafka.
  • Worked on distributed frameworks such as Apache Spark and Presto in Amazon EMR, Redshift and interact with data in other AWS data stores such as Amazon 53 and Amazon DynamoDB.
  • Expert in creating Hive UDFs using Java to analyze the data efficiently.
  • Integrated Kafka with Spark Streaming for real time data processing.
  • Created Session Beans and controller Servlets for handling HTTP requests from Talend.
  • Developed scripts for loading application call logs to S3 and used AWS Glue ETL to load into Redshift for data analytics team.
  • Responsible for developing data pipeline withAmazon AWSto extract the data from weblogs and store inHDFSand worked extensively withSqoopfor importing metadata fromOracle.
  • Performed statistical analysis using SQL, Python, R Programming and Excel.
  • UsedAWS Data Pipelineto schedule anAmazon EMR clusterto clean and process web server logs stored inAmazon S3 bucket.
  • Worked extensively with Excel VBA Macros, Microsoft Access Forms
  • Designed and Developeddata mapping procedures ETL-Data Extraction,Data Analysis and Loading process for integratingdata using R programming.

Environment: Cloudera CDH4.3, Hadoop, Pig, Hive, Informatica, HBase, MapReduce, HDFS, Sqoop, Impala, SQL, Tableau, Python, SAS, Flume, Oozie, Linux

Confidential, Johnston, RI

Spark Developer

Responsibilities:

  • Created Database in Microsoft Access by using blank database and created tables and entered dataset manually and Data Types performed ER Diagram, and Basic SQL Queries on that database.
  • Documented the complete process flow to describe program development, logic, testing, implementation, application integration, coding.
  • Developed predictive analytic using Apache Spark, Scala APIs.
  • Worked on SQL Server Integration Services (SSIS) to integrate and analyze data from multiple heterogeneous information sources.
  • Used SQL and SQL Server for writing simple and complex queries like finding Distinct Values and Unique Values in Data set
  • Created Complex ETL Packages using SSIS to extract data from staging tables to partitioned tables with an incremental load.
  • Performed incremental load with several Dataflow tasks and Control Flow Tasks using SSIS.
  • Involved in writing complex SQL Queries, Stored Procedures, Triggers, Views, Cursors, Joins, Constraints, DDL, DML, and User-Defined Functions to implement the business logic and created clustered and non-clustered indexes.
  • Imported data using Sqoop to load data from MySQL to HDFS regularly.
  • Developed and delivered SQL Scripts to Insert/Update and Delete data in MS SQL database tables.
  • Created various ad-hoc SQL queries for customer requirements, executive management reports and types of report types like tables, matrix, sub reports etc.
  • Designed and built a custom and genetic ETL framework - Spark application using Scala.
  • Worked on data ingestion file validation component for threshold levels, last modified and checksum.
  • Collected log data from web servers and exported to HDFS.
  • Involved in defining job flows, management and log files reviews.
  • Installed Oozie workflow to run Spark, Pig jobs simultaneously.
  • Created hive tables to store the data in table format.
  • Strong programming capability using Python and Hadoop framework utilizing Cloudera Hadoop Ecosystem projects (HDFS, Hadoop, Sqoop, Hive, HBase, Oozie, Impala, Zookeeper, etc.)

Environment: Spark, Scala, HDFS, SQL, Oozie, SQOOP, Zookeeper, MySQL, HBase.

Confidential

SQL Developer

Responsibilities:

  • Performed Logical and Physical database development (using Erwin, normalization, dimension modeling and Enterprise manager).
  • Designed the dashboard of the project for the top management.
  • Worked as a developer in creating complex Stored Procedures, SSIS packages, triggers, cursors, tables, and views and other SQL joins and statements for applications.
  • Worked on MS SQL Server tasks such as data loading, batch jobs, SSIS, Instances, Linked Servers and Indexes.
  • Created and automated the regular jobs.
  • Managed and monitored the system utilization.
  • Managed the security of the servers (Logins and users).
  • Developed complex T-SQL code for the application.
  • Designed and implemented the stored procedures and triggers for automating tasks.
  • Prepared end user documentation, technical documentation and published into the Production.

Environment: SQL Server 2005/2008, Visual Source Safe, MS Access, Query Analyzer, SQL Profiler, Import & Export Data, Windows 2000 Server, Erwin, HTML, JavaScript, Linux, PHP, AS400, DB2

We'd love your feedback!