Senior Big Data Developer Resume Dallas, TX - Hire IT People

SUMMARY

Proficient IT professional experience with 6 + years of experience, specialized in Big Data ecosystem - Data Acquisition, Ingestion, Modeling, Storage Analysis, Integration, and Data Processing.
Experience in involving project development, implementation, deployment, and maintenance using Java/J2EE, Hadoop and Spark related technologies using Cloudera, Hortonworks, Amazon EMR, Azure HDInsight.
Working experience in designing and implementing complete end-to-endHadoopbased data analytical solutions using HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase, Airflow, Zookeeper, Ambari, Flume, Nifi, Databricks, Delta Lake on Databricks and Hive.
A Data Science expert with strong Problem solving, Debugging and Analytical capabilities, who TEMPeffectively engages in understanding and conveying business requirements.
In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce, Spark.
Closely collaborated with business products, production support, engineering team on a regular basis for Diving deep ondata, TEMPEffective decision making and to support Analytics platforms.
Good experience in creating data ingestion pipelines, data transformations, data management, data governance and real time streaming at an enterprise level.
Proficient at writing MapReduce jobs and UDF’s to analyze, transform, and deliver the data as per requirements.
Profound experience in creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
Expertise in building PySpark, and Scala applications for interactive analysis, batch processing, stream processing
Strong working experience with SQL and NoSQL databases, data modeling and data pipelines. Involved in end-to-end development and automation of ETL pipelines using SQL and Python.
Experience developing Pig Latin and HiveQL scripts for Data Analysis and ETL purposes and extended the default functionality by writing User Defined Functions (UDFs), (UDAFs) for custom data specific processing.
Experience in working on CQL (Cassandra Query Language), for retrieving / recovering the data present in Cassandra cluster by running queries in CQL. It supports prepared statements which can execute multiple times.
Experience with migrating data to and from RDBMS into HDFS using Sqoop.
Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
Hands-on Experience with AWS cloud (EMR, EC2, RDS, EBS, S3, Lambda, Glue, Elasticsearch, Kinesis, SQS, DynamoDB, Redshift, ECS).
Good Hands-on full life cycle implementation using CDH (Cloudera) and HDP distributions.
Working Experience on Azure cloud components (HDInsight, Databricks, Data Lake, Blob Storage, Data Factory, Storage Explorer, SQL DB, SQL DWH, Cosmos DB).
Profound Experience at using Spark API’s for streaming real time data, staging, cleansing, applying transformations and preparing data for machine learning needs.
Strong experience in collecting and storing stream data like log data in HDFS using Apache Flume.
Experience with Oozie Workflow Engine to automate and parallelize Hadoop Map/Reduce, Hive and Pig jobs.
Proficient in using Cloudera Manager, an end-to-end tool to manageHadoopoperations in Cloudera Cluster.
Worked with various streaming ingest services with Batch and Real-time processing using Spark streaming, Kafka Confluent, Storm, Flume and Sqoop.
Leveraged different file formats Parquet, Avro, ORC and Flat files.
Sound experience in building production ETL pipelines between several source systems and Enterprise Data Warehouse by leveraging Informatica PowerCenter, SSIS, SSAS and SSRS.
The ETL Decision making engine was build using Spark, HIVE, Scala and Unix Shell scripts and the front-end dashboards were built using Elastic Search and Kibana.
Realtime experience in using Azure services: Portal, Azure Cosmos DB, Azure Synapse Analytics, Azure Data Lake Storage, Azure Data Factory, Azure Stream Analytics, Azure Databricks, Azure Log Analytics and Azure Blob storage.
Solid knowledge of Dimensional Data Modeling with Star Schema and Snowflake for FACT and Dimensions Tables using Analysis Services and identifying the trends in the data.
Experience in using various IDEs Eclipse, IntelliJ, and repositories SVN and Gitversion control systems.
Experience in designing interactive dashboards, reports, performing ad-hoc analysis and visualizations using Tableau, Power BI, Arcadia and Matplotlib.
Sound knowledge and Hands-on-experience with - NLP, Image Detection,MapR, IBM infosphere suite, Storm, Flink, Talend, ER Studio and Ansible.
Successfully working in a fast-paced environment, both independently and in a collaborative way. Expertise in complex troubleshooting, root-cause analysis, and solution development.

TECHNICAL SKILLS

Big Data Technologies: HDFS, Yarn, MapReduce, Pig, Hive, HBase, Cassandra, Zookeeper, Oozie, Sqoop, Flume, HCatalog, Apache Spark, Scala, Impala, Kafka, Storm, Tez, Ganglia, Nagios, Splunk, Elastic Search, Kibana

Hadoop Distributions: Apache Hadoop 2.x/1.x, Cloudera CDP, Hortonworks HDP, Amazon EMR (EMR, EC2, EBS, RDS, S3, Glue, Elasticsearch, Lambda, Kinesis, SQS, DynamoDB, Redshift, ECS) Azure HDInsight (Databricks, Data Lake, Blob Storage, Data Factory, SQL DB, SQL DWH, Cosmos DB, Azure DevOps, Active Directory).

Operating Systems: Windows, Macintosh, Linux, Ubuntu, Unix, CentOS, Red hat.

Programming Languages: C, JAVA, J2EE, SQL, Pig Latin, HiveQL, Scala, Python, Unix Shell Scripting, R.

Java Technologies: JSP, Servlets, Spring, Hibernate, Maven

Web Development: Spring, J2EE, JDBC, Okta, Postman, Swagger, Angular, JFrog, Mokito, Flask, Hibernate, Maven, Tomcat, WebSphere, JavaScript, Node.js, HTML, CSS.

Databases: MS-SQL SERVER, Oracle, MS-Access, MySQL, Teradata, PostgreSQL, DB2.

NoSQL Database: Cassandra, MongoDB, Redis.

Reporting Tools/ETL Tools: Informatica, Talend, SSIS, SSRS, SSAS, ER Studio, Tableau, Power BI, Arcadia, Data stage, Pentaho.

Methodologies: Agile/Scrum, Waterfall.

Development Tools: Eclipse, NetBeans, IntelliJ, Hue, Microsoft Office Suite (Word, Excel, PowerPoint, Access)

Others: Machine learning, NLP, Stream Sets, Spring Boot, Jupyter Notebook, Terraform, Docker, Kubernetes, Jenkins, Chef, Ansible, Splunk, Jira.

PROFESSIONAL EXPERIENCE:

Confidential, Dallas, TX

Senior Big Data Developer

Responsibilities:

Installed and configured with Apache bigdata Hadoop components like HDFS, MapReduce, YARN, Hive, HBase, Sqoop, Pig, Ambari and Nifi.
Migrated from JMS solace to Apache Kafka, usedZookeeper to manage synchronization, serialization, and coordination across the cluster.
Designing and Developing Azure Data Factory (ADF) extensively for ingesting data from different source systems like relational and non-relational to meet business functional requirements.
Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.
Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
Created, provisioned multiple Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters.
Design/Build Extract Transform Load procedures in SQL Server Integration Services using packages to import/export data feeds to/from data warehouse and client systems.
Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow for ETL batch processing to load into Snowflake for analytical processes.
Experience in moving data between GCP (Google Cloud Platform) and Azure using Azure Data Factory.
Developed custom ETL solutions, batch processing and real-time data ingestion pipeline to move data in and out of Hadoop using PySpark and shell scripting.
Ingested huge volume and variety of data from disparate source systems into Azure Data Lake Gen2 using Azure Data Factory V2by using Azure Cluster services.
Designed multiple applications to consume and transport data from S3 to EMR and Redshiftand maintained by EC2.
Processed multiple terabytes of data stored in AWS using Elastic Map Reduce (EMR) to AWS Redshift.
Ingested data through AWS Kinesis Data Stream and Firehose from various sources to S3.
Automated resulting scripts and workflow usingApache Airflowandshell scriptingto ensure daily execution in production.
Developed Spark Applications by using Scala, Java, and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
Experience in moving data between GCP and Azure using Azure Data Factory. responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.
Implemented Spark using Scala utilized Spark SQL heavily for faster development, and processing of data.
Involved in converting Hive/SQL queries into Spark transformations using Spark SQL and Scala.
Created Airflow Scheduling scripts in Python
Building and writing scripts for data modeling, mining toive easier access to Azure Logs, App Insights to PMs & EMs.
Handle the requests for SQL objects, schedule, business logic changes and Ad-hoc queries from customer and analyzing and resolving data sync issues.
Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on GCP (Google Cloud Platform).
Design and develop Hadoop MapReduce programs and algorithms for analysis of cloud-scale classified data stored in Cassandra
Worked on data analysis and reported using Power BI on customer usage metrics. I used dis analysis to present to the leadership towards a product growth to motivated team of engineers and product managers.
Convert and review code from oracle PL/SQL programming to snowflake code, make performance changes and test.
Perform ongoing monitoring, automation, and refinement of data engineering solutions.
Created Linked service to land the data from SFTP location to Azure Data Lake.
Created several Databricks Spark jobs with Pyspark to perform several tables to table operations.
Experience in working on both agile and waterfall methods in a fast pace manner.

Confidential, Denver, CO

Big Data Developer

Responsibilities:

Developed various data loading strategies and performed various transformations for analyzing the datasets by using Hortonworks Distribution for Hadoop ecosystem.
Worked in a Databricks Delta Lake environment on AWS using Spark.
Developed spark-based ingestion framework for ingesting data into HDFS, creating tables in Hive and executing complex computations and parallel data processing.
Design and develop Hadoop MapReduce programs and algorithms for analysis of cloud-scale classified data stored in Cassandra
Worked on scheduling all jobs using Airflow scripts using python. Adding different tasks to DAG’s and dependencies between the tasks.
Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery
Created, managed policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup on AWS.
Developed Impala queries for faster querying and perform data transformations on Hive tables.
Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
Implemented Spark to migrate MapReduce jobs into Spark RDD transformations and Spark streaming.
Developed application to clean semi-structured data like JSON into structured files before ingesting them into HDFS.
Automated the processof transforming and ingesting terabytes of monthly data using Kafka, S3, Lambda and Oozie.
Hands on Amazon EC2, Amazon S3, Amazon RedShift, Amazon EMR, Amazon RDS, Amazon ELB, Amazon Cloud Formation, and other services of the AWS family.
Created the schema and configuration to convert and map the CRF into CIF-Common Interchange Format for interacting with another system called Reltio (Mastering data management tool).
Implemented the code which handles data type conversions, data value mappings and checking for required fields.
Developed functionality to handle merging records in Reltio by adding an additional field in the CRF.
Modified the pipeline for allowing messages to receive the new incoming records for merging, handling missing (NULL) values and triggering a corresponding merge in records on receipt of a merge event.
Worked on the task for propagating down any updates or creation of new Persons records in salesforce to be reflected in the delta lake tables and by adding the new data to the database.
Wrote SQL queries to identify and validate data inconsistencies in data warehouse against source system.
Worked on scheduling all jobs using Airflow scripts using python added different tasks to DAG, LAMBDA.
Created internal tool for comparing the RDBMS and Hadoop such that all the data located in source and target matches using shell script and can decrease the complexity in moving data.
Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among the team.
Executed several complex SQL queries in AWS glue for ETL operations in Spark data frame using SparkSQL.
Created program inpythonto handle PL/SQL functions like cursors and loops which are not supported by snowflake.
Worked on Spark SQL, created data frames by loading data from Hive tables and created prepared data and stored in AWS S3 and interact with the SQL interface using the command line or JDBC.
Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.
Used AWS Cloud Formation in designing and deploying scalable, highly available, and fault tolerant systems on AWS.
Helped the QA team that work on testing and troubleshooting spark job run failures.
Created and managed Kafka topics and producers for the streaming data.
Worked in Agile development environment and Participated in daily scrum and other design related meetings.
Imported and Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team using PowerBI with automated trigger API.
Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.

Confidential, Minneapolis, MN

Big Data Developer

Responsibilities:

Developing End to End ETL Data pipeline that take the data from surge and loading it into the RDBMS using the Spark.
Had experience in Hadoop framework, HDFS, MapReduce processing implementation.
Worked on a live node Hadoop Cluster running Cloudera Distribution Platform (CDH 5.9) and as cloud deployed AWS EMR persistent clusters and configure the cluster.
Designed SSIS (ETL) Packages to extract data from various heterogeneous data sources such as Access database, Excel spreadsheet and flat files into SQL Server and maintain the data.
Developing Data load functions, which reads the schema of the input data and load the data into a table
Used Airflow for scheduling the Hive, Spark and MapReduce jobs.
Working on the Spark SQL for analyzing and applying the transformations on data frames created from the SQS queue and loads them into Database tables and querying.
Working on Amazon S3 for persisting the transformed Spark Data Frames in S3 buckets and using Amazon S3 as a Data-lake to the data pipeline running on spark and Map-Reduce.
Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning
Developing logging functions in Scala which stores logs of the pipeline in Amazon S3 buckets.
Developing Email reconciliation reports for ETL load in Scala using Java Libraries in Spark framework.
Developing AWS Lambda functions which creates the EMR cluster and auto terminates the cluster after job is done.
Working on AWS Cloud Formation Templates, for creating a CFT and stack for creating the EMR Cluster.s
Working on AWS SNS, with subscribes to AWS Lambda and SNS alert when the data reaches the Lake.
Adding the steps to the EMR cluster within the bootstrap actions of AWS Lambda.
Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among the team.
Setting up data pipelines instream setsto copy data from oracle to Snowflake.
Working on fetching data from various source systems such as HIVE, Amazon S3 and Kafka.
Worked with HIVE data warehouse infrastructure-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HQL queries.
Exposure on Spark Architecture and how RDD's work internally by involving and processing the data from Local files, HDFS and RDBMS sources by creating RDD and optimizing for performance.
Spark Streaming collects dis data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (HBase).
Enhanced scripts of existing Python modules. Worked on writing APIs to load the processed data toHBasetables.
Extensively used Accumulators and Broadcast variables to tune the spark applications and to monitor the spark jobs.
Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts.
Used apache airflow in GCP composer environment to build data pipelines and used various airflow operators like bash operator, Hadoop operators and python callable and branching operators.
Tuning Hadoop performance with high availability and involved in recovery of Hadoop clusters.
Used Tableau as a front-end BI tool and MS SQL Server as a back-end database to design and develop dashboards, workbooks, and complex aggregate calculations.
Involved in Agile methodologies, daily scrum meetings, spring planning.

Confidential

Java Developer

Responsibilities:

Involved in all the phases of Software Development Life Cycle (requirements gathering, analysis, design, development, testing, and maintenance).
Involved in development of JavaScript code for client-side validations.
Developed the HTML based web pages for displaying the reports.
Developed front-end screens using JSP, HTML, jQuery, JavaScript, and CSS.
Performed data validation in Struts from beans and Action Classes.
Developed dynamic content of presentation layer using JSP.
Accessed stored procedures and functions using JDBC Callable statements.
Involved in designing use-case diagrams, class diagrams and interaction using UML model with Rational Rose.
Applied J2EE Design Patterns such as Factory, Singleton, and Business delegate, DAO, Front Controller Pattern and MVC and deployed components on Application server where Eclipse was used for component building.
Implemented Hibernate to persist the data into Database and wrote HQL based queries to implement CRUD operations on the data.
Developed the application using Spring Framework that leverages Model View Layer (MVC) architecture UML diagrams like use cases, class diagrams, interaction diagrams and activity diagrams were used.
Developed XML configuration files, properties files used in struts Validate framework for validating Form inputs on server side.Used SOAP as an XML-based protocol for web service operation invocation.
Involved in deployment of application on WebLogic Application Server in Development & QA environment.
Developed coding using SQL, PL/SQL, Queries, Joins, Views, Procedures/Functions, Triggers and Packages.
Developed Web Applications with rich internet applications using Java applets, Silverlight, Java.
Used JDBC for database access. Used Log4J to validate functionalities and JUnit for unit testing.
Played a key role in the high-level design for the implementation of the application.
Designed and established the process and mapping the functional requirement to the workflow process.

We provide IT Staff Augmentation Services!

Senior Big Data Developer Resume

Dallas, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship