Sr. Data Engineer/ml Developer Resume
TX
SUMMARY
- 10+Years of Experience in Information Technology in Application development and Analytics projects using Java, J2EE, Scala, Python and Bigdata/Hadoop ecosystem.
- Experienced in all aspects of software development life cycle including analysis, design, development, modeling, testing, deployment of applications including Machine Learning and upport of Relational Database, Data Warehousing Systems and Data Marts in various domains.
- 7+ years of hands - on experience in Big Data and HADOOP Eosystem components.
- (HDFS, MapReduce, PIG, HIVE, Avro, HBASE, SQOOP, Flume, Oozie, ZooKeeper, Kafka, Spark(Scala), Storm, Cassandra).
- 6 years of hands- on experience in software design and development using Java. (Core Java, Collection Framework, JDBC, Servlets, Jsp, Spring boot, Hibernate, JavaScript, Web Services(REST API) ) .
- Experience in both Cloudera and Hortonworks Platforms.
- Having very good knowledge on Apache Spark with Machine learning.
- Experience working in AGILE methodologies.
- Expertise in Data modeling techniques like Data Modeling- Dimensional/ Star Schema and Snowflake modeling.
- Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data.
- Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data.
- Experienced in Azure Data Factory, Linked Services and create Delta tables for parquet files.
- Experienced in Apache Phoenix to massively parallel and relational database engine supporting OLTP for bigdata using Apache Hbase.
- Experience with Talend Open Studio ETL Datawarehouse & Talend Enterprise platform for Data Management.
- Experience in development and design of ETL (Extract, Transform and Loading data) methodology for supporting data transformations and processing.
- Worked with Unix, Linux (Centos, Ubuntu).
- Installed All Hadoop components in Linux file system for compatability.
- Monitoring all machines with proper data updates and designing applications for getting the data into the framwork.
- Worked on Snowflake Data warehousing Schema,.
- Having Strong analytical skills and designed different machine learning models to train the data based on the history data and predict the future data to know the behaviour of it.
- Worked with MySQL for desinging queries on Spark SQL.
- Highly experienced IT professional with 9+ years of commitment to excellence and the implementation of best practices worldwide specializing in Big Data (Hadoop Ecosystems).
- Having very good experience with Apache Spark, Kafka, Storm, Scala, Java and Python.
- Developed multiple Spark SQL, Spark Streaming applications.
- Designed and deployed Apache Spark SBTs in cluster for data process.
- Developed Kafka Connections for taking Data to HDFS and Designed Spark Streaming.
- Hands on experience on major components of BigData and Hadoop Ecosystems.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Integrated Cassandra with differned messaging systems like Kafka and processing frameworks like Apache Spark.
- Involved in Optimization of Hive Queries.
- Involved in Data Ingestion to HDFS from various data sources like Oracle, Teradata and MainFrames systems.
PROFESSIONAL EXPERIENCE
Confidential, TXSr. Data Engineer/ML Developer
Environment: Java, Scala, Hive, Oozie, Unix Scripting, Kafka, Oracle, Sqoop, Teradata, Spark (Spark Streaming, SparkSQL, Spark Machine Learning), Azure, DataBricks.
Responsibilities:
- Designed spark ml models to predict customer call volumes.
- Created spark Rdd’s, Dataframes using spark scala.
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, Spark SQL Azure Data Lake Analytics.
- Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
- Designed different spark ml classification models like RandomForest and LogisticRegression to classify the text remarks and to predict the call intents.
- Involved in improving the Performance and tuning techniques while testing the spark applications.
- Involved in migrating the
- Developed Spark UDAF’s to generate the list of predictors for the spark models.
- Have good analysis on Hive data and to generate the reports to make use of Spark models to improve the true positive reates, positive precision values and also accuracy.
- Worked to import the data from Teradata to HDFS using Sqoop.
- Designed kafka producers and consumers to ingest the real time streaming data to hive tables.
- Worked on JSON and Parquet files in Spark application to ingest the data.
- Automated the data ingestion to hive from various data sources like Oracle, Teradata.
- Created the Change Request required for the Deployments of application.
- Connect with Hadoop Admins to raise the request for the nodes in cluster and to resolved the slowness of the cluster.
- Developed Rest API using scala and java to send data to DVS platform.
- Developed a data pipeline using kafka and spark streaming to ingest data to Hive tables.
- Reported spark model outputs to mobile first teams for further analysis.
- Automated the Sqoop scripts using Oozie to automate the data ingestion part.
- Worked with eclipse IDE and maven to support and build the proejct.
- Taken care all development, testing and deployements for modes and monitoring the scripts.
- Designed the kafka listners acts as producers to write streaming data to Kafka brockers.
- Used Scala as an underlying language for the spark prediction models.
- Developed Sqoop scripts to ingest the data to Hadoop.
- Developed Spark Jobs written in scala to perform operations like data aggregations, data analysis and data processing.
- Monitoring the spark history server for logs during the application failures.
Confidential
Hadoop/Spark Developer
Environment: Java, Scala (Funtions, Oops), Apache Kafka, Sqoop, Hive, Presto, Apache Spark (Spark Streaming, Spark ML and Spark SQL), Python, UNIX Scripting, Oozie.
Responsibilities:
- Led the complete life cycle of implementation including requirements understanding, pre-scoping, data profiling, design, development, testing and implementation phases of Big RED platform.
- Migrated web sales, stores sales and guest data from EDW to Confidential BigData platform which allows marchents to generate ad-hoc reporting and dashboards.
- Developed framework to ingest data into Big Data platform for discovery and analysis using sqoop.
- Created Hive tables with partitions and implemented standards for the ingestion of text files, JSON, XML in PARQUET and ORC formats and data processing using Pig scripts and UDFs.
- Created workflows templates for the ingestion with oozie actions: hive, shell, ssh, sqoop,pig, mail etc.
- Ingested data from multiple sources click stream, marketing stitch, guest mail, geographics and created hashkey/unique identifer to send the promotions to guests having personalized devices.
- Led data discovery and Integration services, exploring new data sources and collaborating with Product owners and business partners.
- Developed web crawlers to summarize the competitive price for online items based on UPCs.
- Implemented Real time Ingestion and data pipeline using Kafka, Spark,Scala to store data into hadoop.
- Performed data analytics and machine learning algorithms for pattern identification and implemented alert mechanism using prediction model results.
- Governance of application development activities to ensure compliance with enterprise standards.
- Coordinated with Hadoop admin on cluster job performance and security issues, and Hortonworks team to resolve the compatibility and version related issues of HDP, Hive, Spark, Oozie.
- Reviewed the test case covarage and coordinated testing meeting, SIT/UAT with the stakeholders.
- Automated oozie workflows in control-m scheduler and supported in running jobs on the cluster.
Confidential
J2EE/Spark Developer
Environment: Java, JSP, Servlets, Hibernate, Rest API, Tomcat, Web Logic, Spark (Spark SQL, Spark Streaming), Cassandra, Scala(Funtions, Oops), Kafka, Sqoop, Hive, Oozie
Responsibilities:
- Worked with Data Ingestion techniques to move data from various sources to HDFS.
- Designed applications for storing data to HDFS by using Kafka to get more performance.
- Done the Spark and Kafka integration to get the data from Kafka events to Spark input.
- Designed RDDs with Spark Streaming and Spark SQLs
- Presently implementing KAFKA.
- Worked with MongoDB and Cassandra NoSql Db’s.
- Presently implementing Strom,Spark.
- Analysed different formats of Data.
- Worked on writing Map reduce programs using Java.
- Extensly worked with Partitions, Bucketing tables in Hive and designed both Managed and External table.
- Worked on optimization of Hive Queries.
- Created and worked with Sqoop jobs with full refresh and incremental load to populate Hive External tables.
- Worked on Pig to do data transformations.
- Developed UDF’s in Map/reduce, Hive and Pig.
- Worked on Hbase and Its integration with Strom.
- Worked on Apache flume for getting data from Twitter to HDFS.
- Presently implementing KAFKA.
- Presently implementing Strom,Spark.
- Designing and creating Oozie workflows to schedule and manage Hadoop, Hive, pig and sqoop jobs.
- Worked with RDBMS import and export to HDFS.
Confidential
Java / J2EE Developer
Environment: Java, J2EE, Apache Tomcat, Web logic, Web Services
Responsibilities:
- Involved in writing technical design & deployment documents include use cases, component, class & sequence diagram.
- Developed in new enhancement module development ETB (Exist to Bank), Smart marketing forms handling, SDT(Secure Data Transfer).
- Generating WSDLs and deploying web services in Apache Axis Soap Engine.
- Developed software modules using Java, Java beans, Hibernate, Spring. Deployed application on Websphere Application Server.
- Resolving production problem tickets.
- Designing and identifying the business logic components.
- Analysis, Technical spec document & code review.
- Implemented Web services (WSDL) for all Interfaces.
- Developed Java Service Impl classes and Interfaces and Persistence classes to use the Hibernate API.
- Implemented web service proxy and Service Layers for MARA Application.
- Implemented Exception Handling and handled soap fault exception.
- Extensively worked and implemented spring AOP and aspects.
- Implemented Spring Interceptors (IOC) using Advises.
- Developed my energy-message properties and environment configuration files using Core java.
- Good expertise in writing shell scripting.
- Review of Release documentation and involved in live production launch
- Involved in the construction of various components of the system using host system integration services.
